Practitioner's Docket No. NEB-163-PUS 



533Rec'dPCr/PT0 13 SEP 20fl 



CHAPTER II 



Preliminary Classification: 
Proposed Class: 
Subclass: 

NOTE: "All applicants are requested to include a preliminary classification on newly filed patent 

applications. The preliminary classification, preferably class and subclass designations, should be 
t identified in the upper right-hand corner of the letter of transmittal accompanying the application 
papers, for example 'Proposed Class 2, subclass 129/" M.P.EP., § 601, 7th ed. 
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TRANSMITTAL LETTER 
TO THE UNITED STATES ELECTED OFFICE (EO/US) 
(ENTRY INTO U.S. NATIONAL PHASE UNDER CHAPTER II) 

PCT/USOO/ 14122 23 May 2000 24 May 1999 

INTERNATIONAL APPLICATION! NO. INTERNATIONAL FILING DATE PRIORITY DATE CLAIMED 

METHOD FOR GENERATING SPLIT, NON-TRANSFERABLE GENES THAT ARE 

TITLE OF INVENTION 

ABLE TO EXPRESS AN ACTIVE PROTEIN PRODUCT 

APPLICANTS) 



Box PCT 

Assistant Commissioner for Patents 
Washington D.C. 20231 
ATTENTION: EO/US 



CERTIFICATION UNDER 37 C.F.R. §§ 1.8(a) and 1.10* 

(When using Express Mail, the Express Mail label number is mandatory; 
Express Mail certification is optional.) 

I hereby certify that, on the date shown below, this correspondence is being: 

MAILING 

g deposited with the United States Postal Service in an envelope addressed to the Assistant Commissioner 
for Patents, Washington, D.C. 20231 

37 C.F.R. § 1.8(a) 37 C.F.R. § 1.10 * 

□ with sufficient postage as first Oass mail. Jp as "Express Mail Post Office to Addressee" 

Mailing Label No. EL010481545US (mandatory) 

TRANSMISSION 

□ facsimile transmitted to the Patent and Trademark Office, R03) -J^ 



Date: 



Q-I30I 




Signature 

Melissa A 



Jackson 



{type or print name of person certifying) 



* Only the date of filing (§ 1.6) will be the date used in a patent term adjustment calculation, although the date 
on any certificate of mailing or transmission under § 1.8 continues to be taken into account in determining 
timeliness. See § 1.703(f). Consider "Express Mail Post Office to Addressee n (§ 1.10) or facsimile transmission 
(§ 1.6(d)) for the reply to be accorded the earliest possible filing date for patent term adjustment calculations. 
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NOTE: To avoid abandonment of the application, the applicant shall furnish to the USPTO, not later than 20 
months from the priority date: (1) a copy of the international application, unless it has been previously 
communicated by the International Bureau or unless It was originally filed in the USPTO; and (2) the 
basic national fee (see 37 C.F.R. § 1.492(a)). The 30-month time limit may not be extended. 37 C.F.R. 
§ 1495. 

WARNING: Where the items are those which can be submitted to complete the entry of the international 
application into the national phase are subsequent to 30 months from the priority date the 
application is still considered to be in the international state and if mailing procedures are utilized 
to obtain a date the express mail procedure of 37 C.F.R. § 1. 10 must be used (since international 
application papers are not covered by an ordinary certificate of mailing — See 37 C.F.R. § 7.8. 

NOTE: Documents and fees must be clearly identified as a submission to enter the national state under 35 
U.S.C. § 371 otherwise the submission will be considered as being made under 35 U.S.C § 111. 37 
C.F.R. § 1.494(f). 

I. Applicant herewith submits to the United States Elected Office (EO/US) the following 
items under 35 U.S.C. § 371: 

a. [x] This express request to immediately begin national examination procedures 

(35 U.S.C. § 371(f)). 

b. m The U.S. National Fee (35 U.S.C. § 371(c)(1)) and other fees (37 C.F.R § 1.492) 

as indicated below: 
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CLAIMS 
FEE 


(1) FOR 


(2) NUMBER 
FILED 


(3) NUMBER 
EXTRA 


(4) RATE 


(5) CALCULA- 
TIONS 


□* 


TOTAL 
CLAIMS 


HU -20 = 


20 


X $18.00 = 


* 360.00 




INDEPENDENT 
CLAIMS 


5 -3 = 


2 


x $80.00- 


160.00 




MULTIPLE DEPENDENT CLAIM(S) (if applicable) 


+ $270.00 


270.00 


BASIC FEE** 


B U.S. PTO WAS INTERNATIONAL PRELIMINARY EXAMINATION 
AUTHORITY 

Where an International preliminary examination fee as set forth 
in § 1.482 has been paid on the international application to the 
U.S. PTO: 

EO and the international preliminary examination report 
states that the criteria of novelty, inventive step (non- 
obviousness) and industrial activity, as defined in PCT 
Article 33(1) to (4) have been satisfied for all the 
claims presented in the application entering the 
national stage (37 C.F.R. § 1.492(a)(4)) „ $100.00 

□ and the above requirements are not met (37 C.F.R. 
§ 1.492(a)(1)) $690.00 


100.00 




□ U.S. PTO WAS NOT INTERNATIONAL PRELIMINARY 
EXAMINATION AUTHORITY 

Where no international preliminary examination fee as set forth 
in § 1.482 has been paid to the U.S. PTO, and payment of an 
international search fee as set forth in § 1.445(a)(2) to the U.S. 
PTO: 

□ has been paid (37 C.F.R. § 1.492(a)(2)) $710.00 

□ has not been paid (37 C.F.R. § 1.492(a)(3)) $1000.00 

□ where a search report on the international application 
has been prepared by the European Patent Office or 
the Japanese Patent Office (37 C.F.R. 






Total of above Calculations 


890.00 


SMALL 
ENTITY 


Reduction by 1/2 for filing by small entity, if applicable. Assertion 
must be made, (note 37 C.F.R. § 1.27) 


445.00 




Subtotal 


445.00 




Total National Fee 


$ 445.00 




Fee for recording the enclosed assignment document $40.00 (37 
C.F.R. § 1.21(h)). (See Item 13 below). See attached "ASSIGNMENT 
COVER SHEET'. 




TOTAL 


Total Fees enclosed 


$445.00 
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*See attached Preliminary Amendment Reducing the Number of Claims. 

H Attached is a ® check □ money order in the amount of $ 445,00 

□ Authorization is hereby made to charge the amount of $ 

H to Deposit Account No, 14-0740 

□ to Credit card as shown on the attached credit card information authoriza- 
tion form PTO-2038. 

WARNING: Credit card information should not be included on this form as it may become public. 

B Charge any additional fees required by this paper or credit any overpayment 
in the manner authorized above. 

A duplicate of this paper is attached. 

"WARNING: To avoid abandonment of the application the applicant shail furnish to the United States Patent 
and Trademark Office not later than the expiration of 30 months from the priority date: * * * (2) 
the basic national fee (see § 1.492(a)). The 30-month time limit may not be extended. " 37 C.F.R. 
§ 1.495(b). 

WARNING: if the translation of the international application and/or the oath or declaration have not been 
submitted by the applicant within thirty (30) months from the priority date r such requirements may 
be met within a time period set by the Office. 37 C.F.R, § 1.495(b)(2). The payment of the surcharge 
set forth in § 1.492(e) is required as a condition for accepting the oath or declaration later than 
thirty (30} months after the priority date. The payment of the processing fee set forth in § 1.492(f) 
is required for acceptance of an English translation later than thirty (30) months after the priority 
date. Failure to comply with these requirements will result in abandonment of the application. The 
provisions of § 1.136 apply to the period which is set Notice of Jan. 3, 1993 r 1147 O.G. 29 to 
40. 

0 Assertion of Small Entity Status 

B Applicant hereby asserts status as a small entity under 37 C.F.R. § 1.27. 

NOTE: 37 C.F.R. § 1.27(c) deals with the assertion of small entity status, whether by a written specific 
declaration thereof or by payment as a small entity of the basic filing fee or the fee for the entry into 
the national phase as states: 

"(c) Assertion of small entity status. Any party (person, small business concern or nonprofit 
organization) should make a determination, pursuant to paragraph (f) of this section, of entitlement 
to be accorded small entity status based on the definitions set forth in paragraph (a) of this section, 
and must, in order to establish small entity status for the purpose of paying small entity fees, actually 
make an assertion of entitlement to small entity status, in the manner set forth in paragraphs (c)(1) 
or (c)(3) of this section, in the application or patent in which such small entity fees are to be paid. 

(1) Assertion by writing. Small entity status may be established by a written assertion of entitlement 
to smafl entity status. A written assertion must: 

(i) Be clearly identifiable; 

(ii) Be signed (see paragraph (c)(2) of this section); and 

(iii) Convey the concept of entitlement to smafl entity status, such as by stating that applicant 
is a small entity, or that small entity status is entitled to be asserted for the application or patent 
While no specific words or wording are required to assert small entity status, the intent to assert 
small entity status must be dearly indicated in order to comply with the assertion requirement 

(2) Parties who can sign and file the written assertion. The written assertion can be signed by: 

ft) One of the parties identified in §§ 1.33(b) (e.g., an attorney or agent registered with the Office), 
§§ 3. 73(b) of this chapter notwithstanding, who can also file the written assertion; 

(ii) At feast one of the individuals identified as an inventor (even though a§§ 1.63 executed oath 
or declaration has not been submitted), notwithstanding §§ 1.33(b)(4), who can also file the 
written assertion pursuant to the exception under §§ 1.33(b) of this part; or 

(iii) An assignee of an undivided part interest, notwithstanding §§ 1.33(b)(3) and 3.73(b) of this 
chapter, but the partial assignee cannot file the assertion without resort to a party identified under 
§§ 1.33(b) of this part. 
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(3) Assertion by payment of the small entity basic filing or basic national fee. The payment, by any 
party, of the exact amount of one of the small entity basic filing fees set forth in §§ 1.16(a), (f), 
(g), (h), or (k), or one of the small entity basic national fees set forth in §§ 1.492(a)(1), (a)(2), (a)(3), 
(a)(4), or (a)(5), wilt be treated as a written assertion of entitlement to small entity status even if the 
type of basic filing or basic national fee is inadvertently selected in error. 

(0 If the Office accords small entity status based on payment of a small entity basic filing or basic 
national fee under paragraph (c)(3) of this section that is not applicable to that application, any 
balance of the small entity fee that is applicable to that application will be due along with the 
appropriate surcharge set forth in §§ 1.1 6(e), or §§ 1.16(1). 

fti) The payment of any small entity fee other than those set forth in paragraph (c)(3) of this section 
(whether in the exact fee amount or not) wilt not be treated as a written assertion of entitlement 
to small entity status and will not be sufficient to establish small entity status in an application 
or a patent " 

3. E A copy of the International application as filed (35 U.S.C. § 371(c)(2)): 

NOTE: Section 1.495 (b) was amended to require that the basic national fee and a copy of the international 
application must be filed with the Office by 30 months from the priority date to avoid abandonment 
u The International Bureau normally provides the copy of the international application to the Office in 
accordance with PCT Article 20. At the same time, the International Bureau notifies applicant of the 
communication to the Office. In accordance with PCT Rule 47.1, that notice shall be accepted by all 
designated offices as conclusive evidence that the communication has duly taken place. Thus, if the 
applicant desires to enter the national stage, the applicant normally need only check to be sure the 
notice from the International Bureau has been received and then pay the basic national fee by 30 months 
from the priority date." Notice of Jan. 7, 1993, 1147 O.G, 29 to 40, at 35-36. See item 14c below. 

a. □ is transmitted herewith. 

b. SJ is not required, as the application was filed with the United States 

Receiving Office. 

c. □ has been transmitted 

i. □ by the International Bureau. 

Date of mailing of the application (from form PCT/1 B/308): 



it. □ by applicant on (Date) 

A translation of the International application into the English language 
(35 U.S.C. § 371(c)(2)): 

a □ is transmitted herewith. 

b. 0 is not required as the application was filed in English. 

c. □ was previously transmitted by applicant on . (Date) 

d. □ will follow. 
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5. □ Amendments to the claims of the International application under PCT Article 19 
(35 U.S.C. § 371(c)(3)): 

NOTE: The Notice of January 7, 1993 points out that 37 C.F.R, § 1.495(a) was amended to clarify the existing 
and continuing practice that PCT Article 19 amendments must be submitted by 30 months from the 
priority date and this deadline may not be extended. The Notice further advises that: u The failure to 
do so will not result in loss of the subject matter of the PCT Article 19 amendments. Applicant may 
submit that subject matter in a preliminary amendment filed under section 1.121. In many cases, filing 
an amendment under section 1.121 is preferable since grammatical or idiomatic errors may be 
corrected." 1147 O.G. 29-40, at 36. 

a. □ are transmitted herewith. 

b. □ have been transmitted 

i. □ by the International Bureau. 

Date of mailing of the amendment (from form PCT/1 B/308): 



ii. □ by applicant on (Date) 

c. □ have not been transmitted as 

i. □ applicant chose not to make amendments under PCT Article 1 9. 
Date of mailing of Search Report (from form PCT/ISA/210.): 



ii. □ the time limit for the submission of amendments has not yet 
expired. The amendments or a statement that amendments have 
not been made will be transmitted before the expiration of the time 
limit under PCT Rule 46.1. 

6. □ A translation of the amendments to the claims under PCT Article 1 9 

(38 U.S.C. § 371(c)(3)): 

a. □ is transmitted herewith. 

b. □ is not required as the amendments were made in the English language. 

c. □ has not been transmitted for reasons indicated at point 5(c) above. 

7. B A copy of the international examination report (PCT/IPEA/409) 

□ is transmitted herewith. 

El is not required as the application was filed with the United States 
Receiving Office. 

8. K3 Annex(es) to the international preliminary examination report 

a. □ is/are transmitted herewith. 

b. B is/are not required as the application was filed with the United States 

Receiving Office. 

9. B A translation of the annexes to the international preliminary examination report 

a. □ is transmitted herewith. 

b. H is not required as the annexes are in the English language. 
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10. H An oath or declaration of the inventor (35 U.S.C. § 371(c)(4)) complying with 

35 U.S.C. § 115 

a. □ was previously submitted by applicant on (Date) 

b. □ is submitted herewith, and such oath or declaration 

i. □ is attached to the application. 

ii. □ identifies the application and any amendments under PCT Article 

19 that were transmitted as stated in points 3(b) or 3(c) and 5(b); 
and states that they were reviewed by the inventor as required by 
37 C.F.R. § 1.70. 

c. S3 will follow. 

II. Other documents) or information included: 

11. □ An International Search Report (PCT/ISA/210) or Declaration under 

PCT Article 17(2){a): 

a. □ is transmitted herewith. 

b. □ has been transmitted by the International Bureau. 

Date of mailing (from form PCT/IB/308): 

c. □ is not required, as the application was searched by the United States 

International Searching Authority. 

d. □ will be transmitted promptly upon request. 

e. □ has been submitted by applicant on (Date) 

12. DD An Information Disclosure Statement under 37 C.F.R. §§ 1.97 and 1.98: 

a. □ is transmitted herewith. 
Also transmitted herewith is/are: 

□ Form PTO-1449 (PTO/SB/08A and 08B). 

□ Copies of citations listed. 

b. D3 will be transmitted within THREE MONTHS of the date of submission 

of requirements under 35 U.S.C, § 371(c). 

c. □ was previously submitted by applicant on _ (Date) 

13. □ An assignment document is transmitted herewith for recording. 

A separate □ "COVER SHEET FOR ASSIGNMENT (DOCUMENT) ACCOMPA- 
NYING NEW PATENT APPLICATION" or □ FORM PTO 1595 is also attached. 
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14. H Additional documents: 

a. □ Copy of request (PCT/RO/101) 

b. □ international Publication No 



1. □ Specification, claims and drawing 
ii. □ Front page only 

c. 59 Preliminary amendment (37 C.F.R. § 1.121) 

d. ® Other 

1 . cjpqiiPflp.P T.-icifing or> disk. as w ell as a papercopy and 

statement regarding submission of the same 

2. substitute pages of specification incorporating amendm ents 
submitted herewith. ~~ ~~ 

15. @ The above checked items are being transmitted 

a. B before 30 months from any claimed priority date. 

b. □ after 30 months. 

16. □ Certain requirements under 35 U.S.C. § 371 were previously submitted by the 

applicant on , , namely: 



AUTHORIZATION TO CHARGE ADDITIONAL FEES 

WARNING: Accurately count claims, especially multiple dependant claims, to avoid unexpected high charges 
if extra claims are authorized. 

NOTE: "A written request may be submitted in an application that is an authorization to treat any concurrent 
or future reply, requiring a petition for an extension of time under this paragraph for its timely submission, 
as incorporating a petition for extension of time for the appropriate length of time. An authorization to 
charge all required fees, fees under § 1.17, or all required extension of time fees will be treated as 
a constructive petition for an extension of time in any concurrent or future reply requiring a petition 
'or an extension of time under this paragraph for its timely submission. Submission of the fee set fortn 
in §1.1 7(a) will also be treated as a constructive petition for an extension of time in any concurrent 
reply requiring a petition for an extension of time under this paragraph for its timely submission." 37 
C.F.R. § 1.136(a)(3). 

NOTE: "Amounts of twenty-five dollars or less will not be returned unless specifically requested within a 
reasonable time, nor will the payer be notified of such amounts; amounts over twenty-five dollars may 
be returned by check or, if requested, by credit to a deposit account." 37 C.FR. § 1.26(a). 

0 Please charge, in the manner authorized above, the following additional fees that 
may be required by this paper and during the entire pendency of this application: 

a 37 C.F.R. § 1.492(a)(1), (2), (3), and (4) (filing fees) 

WARNING: Because failure to pay the national fee within 30 months without extension (37 C.F.R. § 1.495(b)(2)) 
results in abandonment of the application, it would be best to always check the above box. 
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□ 37 C.F.R. § 1.492(b), (c) and (d) (presentation of extra claims) 

NOTE: Because additional fees for excess or multiple dependent claims not paid on filing or on later presentation 
must only bo paid or these claims cancelled by amendment prior to the expiration of the time period 
set for response by the PTO in any notice of fee deficiency (37 C.F.R. § 1.492(d)), it might be best 
not to authorize the PTO to charge additional claim fees, except possible when dealing with amendments 
after final action. 

□ 37 C.F.R. § 1.17 (application processing fees) 

□ 37 C.F.R. § 1.17(a)(1H5) (extension fees pursuant to § 1.136(a). 

□ 37 C.F.R. § 1 .1 8 (issue fee at or before mailing of Notice of Allowance, 
pursuant to 37 C.F.R. § 1.311(b)) 

NOTE: Where an authorization to charge the issue fee to a deposit account has been filed before the mailing 
of a Notice of Allowance, the issue fee will be automatically charged to the deposit account at the time 
of mailing the notice of allowance, 37 C.F.R § 1 r 311(b). 

NOTE: 37 C.F.R § 1.28(b) requires "Notification of any change in loss of entitlement to small entity status must 
be filed in the application . . . prior to paying, or at the time of paying . . . issue fee. 9 From the wording 
of 37 C.F.R. § 1 .28(b): (a) notification of change of status must be made even if the fee is paid as "other 
than a small entity" and (b) no notification is required if the change is to another small entity. 

□ 37 C.F.R. § 1.492(e) and (f) (surcharge fees for filing the declaration 
and/or filing an English translation of an International Application later 
than 30 months after the- priority date). 



Reg. No.: 30901 

Tel. No.: (978 ) 927-5054 X:292 
Customer No.: 28986 




Gregory D. Williams 
General Counsel 

{type or print name of practitioner) 
New England Biolabs, Inc. 

32 Tozer Road 

P.O. Address 
Beverly, MA 01915 
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IN THE UNITED STATES ELECTED OFFICE (EO/US) 

U.S. Application No.: Filing Date: 

International Application No.: PCT/US00/14122 

International Filing Date: 23 May 2000 

Priority Date Claimed: 24 May 1999 

Title of Invention: Method For Generating Split, 

Non-Transferable Genes That Are 
Able To Express An Active Protein 
Product 

Applicant(s): XU, Ming-Qun 

EVANS, Thomas C. 
PRADHAN, Sriharsa 
COMB, Donald G. 
PAULUS, Henry 
SUN, Luo 
CHEN, Lixin 
GHOSH, Inca 

NEW ENGLAND BIOLABS, INC. 

BOSTON BIOMEDICAL RESEARCH INSTITUTE 

BOX PCT 

Commissioner of Patents 

and Trademarks 
Washington, DC 20231 



Sir: 

PRELIMINARY AMENDMENT 

Applicants request that the International Application be 
amended as follows: 

IN THE SPECIFICATION 

On page 73, lines 15-16, "May , 2000 and 

received ATCC Patent Accession No. ." with —May 23, 

2000 and received ATCC Accession No. PTA-1898.— 



Applicants have provided a substitute page 73 which 
includes the amended text. 



New England Biolabs, Inc. 

Entry of National Phase Application 

PCT/US00/14122; Filing Date: 23 May 2000 

Priority Date: 24 May 1999 

Page 2 



Applicants have amended the Application at page 73 to 
include the ATCC Deposit information (a copy of which are 
enclosed herewith) which was not available at the time of 
filing the International Application. No new matter has been 
added by virtue of the amendment made to the specification. 

Applicants affirm that should the microorganism mutate, 
become nonviable or inadvertently destroyed, Applicants will 
replace such microorganism for at least thirty (30) years from the 
date of the original deposit, or at least 5 years from the date of 
the most recent request for release of a sample or for the life of 
any patent on the above-mentioned Application, whichever 
period is longer. 

Applicants affirm that the deposit has been made under 
conditions of assurance of (i) ready accessibility thereto by the 
public if a patent is granted whereby all restrictions to the 
availability to the public of the culture so deposited will be 
irrevocably removed upon the granting of the Patent (M.P.E.P. 
608.01 (p)), and (ii) access to the culture will be available during 
pendency of the Patent Application to one determined by the 
Commissioner to be entitled thereto under 37 C.F.R. 1.14 and 35 
U.S.C. 122. 



REMARKS 



Respectfully submitted, 



NEW ENGLAND BIOLABS, INC. 




Gregory D. Williams 
(Reg. No. 30901) 
New England Biolabs, Inc. 
32 Tozer Road 

Beverly, Massachusetts 01915 
(978) 927-5054; Ext. 292 
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the control of pTac promoter and confers resistance to 
ampicillin. 

p215EN2 or p235EN2 were constructed by ligating the 
5 Ncol to Kpnl fragment of pCE215DnaE or pCE235DnaE into 

the same sites of pCEN2. p215EN2 or p235EN2 has the N- 
terminus of EPSPS (residues 1-215 for p215EN2, 1-235 for 
p235) fused to the IN n . 



10 The Ncol to Fspl fragment of pCYB3 was ligated into the 

Ncol to Oral sites of pKEBl to generate pKEB12 (NEB#1282). 
A sample of pKEB12 plasmid transformed in E. coli strain 
ER2566 has been deposited under the terms and conditions 
of the Budapest Treaty with the American Type Culture 

15 Collection on May 23, 2000 and received ATCC Patent Deposit 

Designation No. PTA-1898. This vector has the C-terminal 36 
amino acid residues of the Ssp DnaE intein (IN n ) fused to CBD 
and confers resistance to kanamycin. 



20 pEPS#28 and pEPS#29 were constructed by ligating the 

Bglll to Pstl fragment of pCE215DnaE and pCE235DnaE into 
the same sites of pKEB12. pEPS#28 or pEPS#29 has the C- 
terminus of EPSPS (residues 216-427 for pEPS#28, 236-427 
for pEPS#29) replacing the CBD in pKEB12 and attached to 

25 the C-terminus of IN C . 
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M ETHOD FOR GENERATING SPLIT, NON-TRANS FERABLE 
GE NES THAT ARE ABLE TO EXPRESS AN ACTIVE~PRQTETN 
" PRODUCT ^ 

BACKGROUND OF THE INVENTION 

In the past few years, agriculture in the United States 
has been revolutionized by the introduction of transgenic 
crops that are resistant to specific diseases, insects, 
herbicides or have improved nutritional value. At the same 
time, much concern has been expressed around the world 
that these genetically modified (GM) agricultural products may 
be harmful to the consumer and that the transgenes could be 
transferred to related plant species so as to generate insect- 
or herbicide-resistant "superweeds" (Ferber, D., Science 
286:1662 (1999)) or consumed by other organisms to their 
detriment (Losey, et ah, Nature 399:214 (1999)). Whereas 
there is little scientific basis to the fear of harmful effects of 
"GM foods", the possibility that transgenes are transferred to 
other plants and thereby have an adverse ecological impact is 
not entirely unfounded (Bergelson, et a!., Nature 395:25 
(1998)). Such transfer could occur either by pollination of 
closely related species or by the transfer of gene fragments 
to unrelated plants by viral or plasmid vectors whose 
transmission may be mediated by plant-associated fungi, 
bacteria or insects. 

There have been a number of techniques discussed for 
the prevention of transgene spread, however these 
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procedures either are designed to have a negative impact on 
the new hybrid plant (Gressel, Trends BiotechnoL, 17:361-366 
(1999)), as in the case of tandem constructs or will not 
eliminate the possiblity of spread by horizontal gene transfer 
5 (Bertolla and Simonet, «es, MicrobioL, 150:375-384 (1999)). 

In this disclosure, we propose a new type of transgene 
that allows efficient protein expression but does not require a 
gene coupling approach and has a significantly lower chance 
10 of spread by horizontal gene transfer. 

SUMMARY OF THE INVENTION 

In accordance with the present invention, there is 
15 disclosed a new type of transgene system that allows 

efficient protein expression in a target host such as a plant, 
but avoids the undesirable result of the migration of the 
transgene into related host systems and/or to the 
environment via the pollen. The methods described herein 
20 can also be applied to the expression of virtually any protein 

of interest (e.g. a toxic protein) in eukaryotic (yeast, insect, 
mammalian cells, etc.) and prokaryotic {E. coli f etc.) organisms. 

In each case, the target gene is split into at least two 
25 segments, each can be fused to a portion of an intein coding 

sequence. Each fusion gene is expressed as an inactive 
protein and these separately expressed fusion proteins are 
reassembled into an active form. Compartmentalization of 
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the gene fragments allows the target protein to be 
reconstituted in a desired location and can prevent the 
transmission of a functional gene to other organisms. 

It should be noted that although the present invention 
is specifically exemplified in agriculture and plant 
biotechnology, the approach proposed here has a much 
broader scope and can be applied to any gene expressed in 
any organism for the prevention of its accidental transfer to 
another organism. 

DESCRIPTION OF THE DRAWINGS 

Figure 1A - Protein Splicing Mechanism. Protein splicing 
is a post-translational processing event involving the excision 
of an internal protein segment, the intern, from a precursor 
protein with the concomitant ligation of the flanking N- and C- 
terminal regions (the exteins). Sequence alignment reveals 
that there are highly conserved residues at the two splice 
junctions: a cysteine or serine residue at the N-terminus of 
the intein, His-Asn at the Oterminus of the intein, and Cys, 
Ser or Thr as the first residue of the C-terminai extein. These 
conserved splice junction residues are directly involved in the 
catalysis of peptide bond cleavage and ligation of the protein 
splicing reactions. The chemical mechanism of protein splicing 
with an intein which has cysteine residues at its N-terminus 
and adjacent to its Oterminus is shown in Figure 1: Step 1- 
Formation of a linear thioester intermediate by an N-S acyl 
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rearrangement of Cysl at the N-terminus of the intein; Step 

2- Formation of a branched intermediate by transesterification 
involving attack by the Cys immediately following the O 
terminus of the intein on the thioester formed in Step 1; Step 

3- Excision of the intein by peptide bond cleavage coupled to 
succinimide formation involving the intein C-terminal Asn 
residue; Step 4- Spontaneous S-N acyl rearrangement of the 
transitory ligation product from a thioester to a stable amide 
bond. Protein splicing involving other inteins presumably 
proceeds by four analogous chemical steps, except that the 
Cys residues shown in Figure 1 can be replaced by Ser or Thr, 
so that Steps 1 and 4 are N-0 and O-N acyl shifts, 
respectively. 

Figure IB - Cartoon of protein splicing. 

Figure 2 - Trans-Splicing. 

Figure 2A-The association of the fsi-terminal and C- 
terminai intein fragments aligns the two splice junctions for 
the fusion of the N- and C-extein sequences. The splicing 
reaction presumably occurs via the same splicing pathway as 
the c/s-splicing pathway proposed previously. 

Figure 2B-AIternatively, in the absence of splicing the 
intein could facilitate the association of the two extein 
sequences with the subsequent generation of enzymatic 
activity. This has been termed intein-mediated 
complementation. 
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Figure 3 - Ssp DnaE intein gene arrangement in 
Synechocystis sp PCC6803, The genome of the blue-green 
algae Synechocystis sp PCC6803 contains the split dnaE gene 
5 with the fragments located 745 kb apart. The naturally 

occurring trans-splicing intein fuses the two gene product 
fragments to produce an active polymerase. 

Figure 4A - Splitting of a target gene. A target gene can 
10 be split into two fragments with partial intein genes fused at 

the C- and N-terminal portions. These split genes can be 
placed into plant chromosomes so that the following 
expression can be reconstituted. 

15 Figure 4B - Containment of a trans-gene. The gene of 

interest, in this case an herbicide resistance gene, is divided 
into two fragments (target N and target C) and an intein (IN n 
and Il\l c ) is fused to each partial gene. The two gene fusions 
are placed on separate, remote locations on the genome. 

20 One of these may be in the chloroplast, the other in the 

nuclear genome. The chloroplast located transgene is 
transcribed and translated in the chloroplast while the nuclear 
transgene is transcribed in the nucleus and translated in the 
cytoplasm. Following translation of the nuclear gene it is 

25 transported into the chloroplast with the help of chloroplast 

transit peptide where it can associate with the other gene 
fragment using the intein as either an association or splicing 
element. 
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Figure 5 - Trans-splicing of acetolactate synthase (ALS) 
in E. coli strain ER2744. The target gene is split by intein 
fragments (IN n and IN C ) and expressed as two inactive partial 
5 proteins. Protein trans-splicing produces an active target 

protein product in host cells. 

Figure 6 - Sequence alignment for acetolactate synthase 
(ALS) genes (SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, 
10 SEQ ID NO:45 and SEQ ID NO:46). The gap region for E. coli 

acetolactate synthase II (ALSII) is underlined. The arrow 
indicates the split site for E. coli ALSII. The star indicates the 
split site for maize ALS. 

15 Figure 7 - Plate assay showing that ALSIIm-14 renders 

E. cofi ER2744 resistant to valine and herbicide, SM. E, coii 
ER2744 cells were transformed with plasmid DNA expressing 
ALSII protein (1), ALSIIm (2), ALSIIm-14 (3) and plated on M9 
medium containing 0.3 mM IPTG, with 100 pg/ml of valine (a), 

20 or with 100 pg/ml valine and 50 pg/ml SM (b). The plate 

assay was performed at 30°C for 50 hours. 

Figure 8 - Production of recombinant ALSIIm-14 through 
Ssp DnaE intein mediated trans-splicing. 2pl of whole cell 
25 extract, from cells transformed with expression plasmids for 

control (lane 1), ALSII (lane 2), ALSIIm(N)-IN n (lane 3), 
ALSIIm(C)-IN c (lane 4), ALSIIm(N)-IN n and ALSIIm(C)-IN c (lane 
5), was run on an SDS-poiyacrylamide (12%) gel, transferred 
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to a S&S nitrocellulose membrane, and probed with antiserum 
against ALSII N-terminus (Figure 8A) or against ALSII C- 
terminus (Figure 8B). (Figure 8C) The efficiency of trans- 
splicing is temperature sensitive. Western blot was performed 

5 using a antiserum against ALSIIm N-terminus. Protein extract 

was made from cells transformed with expression plasmids for 
control E. coli extracts contain a non-specific protein (the top 
band) that reacts with antiserum: (lane 1), ALSII (lane 2), 
ALSIIm(N)- IN n and ALSIIm(C)-IN c (lane 3 to lane 6). The cell 

10 culture temperature is 37°C for lane 1 to iane 3, 30°C for 

lane 4, 25°C for lane 5, and 15°C for lane 6. 



Figure 9 - Assays for acetolactate synthase II (ALSII) 
Activity. 

15 Figure 9A - Co-expression of ALSIIm(N)-IN n and 

ALSIIm(C)-IN c rescued cell growth on a valine plus herbicide 
added plate. E. coli ER2744, transformed with expression 
plasmids for ALSII (1), ALSIIm (2), ALSIIm(N)-IN n and 
ALSnm(C)-IN c (3), ALSIIm(N)- IN n (4), ALSIIm(C)-IN c (5), 

20 ALSIIm(N) and ALSIIm(C) (6), were plated on M9 medium at 

37°C (a), 37°C with 100 jjg/ml valine (b), 30°C with 100 
pg/ml valine (c), and 30°C with 100 pg/ml valine and 50 pg/ml 
sulfometuron methyl (SM) (d). Plates contained 0.3mM IPTG. 



25 Figure 9B - Co-expression of ALSIIm(N)-IN n and ALSIIm- 

(C)-IN C rescued cell growth in valine and herbicide added 
medium. E. coli ER2744, transformed with expression 
plasmids for fusion proteins as indicated under graph, was 
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cultured in M9 medium (0.3mM IPTG), with or without 100 
pg/ml valine and 50 yg/mi sulfometuron methyl (SM) as 
indicated. OD 60 o was taken to determine the cell growth rate 
after cells were cultured for 40 hours at 30°C. 

5 

Figure 9C - The time course study on the growth rate of 
cells expressing ALSIIm(N)-IIM n and ALSIIm(C)-IN c . E. coli 
ER2744, transformed with the expression plasmids for 
proteins as indicated, was cultured at 30°C in M9 medium 
10 (0.3mM IPTG) with the addition of 100 pg/ml valine. The cell 

density was determined by measuring OD 6 oo at several time 
points as indicated. 



Figure 10 - Western blot detection of trans-splicing 
15 product, maize ALS-14. 2 [i\ of whole cell lysate, from E. coli 

ER2744 cells transformed with expression plasmids for control 
(lane 1) (please note the antibody reacts with a non-specific 
protein in E. coli), cALS (lane 2), cALS(N)-IN n (lane 3), 
cALS(C)IN c (lane 4), cALS(N)-IN n and cALS(C)-IN c (lane 5), was 
20 run on a 12% SDS polyacrylamide gel, transferred to a S&S 

Nitracellulose membrane and probed with antiserum against 
cALS N-terminus (A) or cALS Oterminus (B). cALS indicates 
corn/maize ALS protein. 



25 Figure 11 - Plating Assay for Ssp DnaE intein Cis-splicing 

Constructs. Plasmids pCE182DnaE / pCE215DnaE, 
pCE235DnaE, and pCE267DnaE encode for the 5-enolpyruvyi- 
3-phosphoshikimate synthetase (EPSPS) protein with the full 
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length Ssp DnaE intein inserted at amino acid positions 182, 
215, 235 and 267, respectively. These were transformed into 
ER2799 E coli cells (which require the EPSPS protein for 
viability in M9 minimal media), and plated on M9 minimal 
5 plates. Following incubation at 37°C overnight, individual 

clones on each plate were picked and stripped onto a single 
M9 minimal plate. This master plate was then incubated at 
37°C overnight or RT for 2-3 days. As a control the pCYB3 
plasmid was used as it carries no EPSPS gene, and there is 
10 no growth on the selection plate. pOE2, a plasmid which 

contains the full length wild type EPSPS containing a 
ProlOlSer mutation, grows on M9 selection plate and also 
confers glyphosate resistance. 

15 Figure 12 - Plating Assay for the Ssp DnaE intein Trans- 

splicing Constructs at Positions 215 and 235. 

The activity of each 5-enoipyruvyl-3-phosphoshikimate 
synthetase (EPSPS) trans-splicing construct was assayed by 

20 co-transforming the matching constructs into E. coli ER2799 

ceils and plating on an M9 selection plate, pCYB3 or pKYBl 
(New England Biolabs, Inc., Beverly, MA), which has no EPSPS 
gene present, was used to provide ampicillin or kanamycin 
resistance when testing the activity of each half of the EPSPS 

25 gene. 



The plasmids used were: pC+E2, which contains the full 
length EPSPS mutant gene; p215EN2, which has the first 215 
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amino acids of EPSPS fused to the N-terminal splicing domain 
of the Ssp DnaE intern; p235EN2, which has the first 235 
amino acids of EPSPS fused to the N-terminal splicing domain 
of the Ssp DnaE intein; pEPS#28, which contains amino acids 
5 216-427 of the EPSPS gene fused to the C-terminai splicing 

domain of the Ssp DnaE intein; pEPS#29, which contains 
amino acids 236-427 of the EPSPS gene fused to the C- 
terminal splicing domain of the Ssp DnaE intein; pEPS#33, 
which has the first 235 amino acids of EPSPS fused to a 

10 splicing defective N-terminal domain of the Ssp DnaE intein; 

pEPS#37, which has amino acids 236-427 of EPSPS fused to 
a splicing defective C-terminal domain of the Ssp DnaE intein; 
pEPS#34 / which has the first 235 amino acids of EPSPS, but 
no intein fragment; and pEPS#36, which has amino acids 236- 

15 427 of EPSPS and no intein fragment. These plasmids were 

co-transformed, in various combinations, into ER2799 E. coli 
cells, and plated on both LB plates and M9 plates, each plate 
was supplemented with 100 pg/mL ampicillin and 50 pg/mL 
kanamycin and 0.3 mM IPTG. Individual clones were picked 

20 from each LB plate and stripped on one M9 selection plate 

following incubation at 37°C overnight or RT for 2-3 days. 
The M9 minimal media selection plate contained 100 pg/mL 
ampicillin and 50 pg/mL kanamycin and 0.3 mM IPTG. The 
combinations used were: WT, pC+E2 and pKYB; 215NC, 

25 p215EIM2 and pEPS#28; 215C, pEPS#28 and pCYB3; 235NC- 

Dead, pEPS#33 and pEPS#37; 235NC, p235EN2 and 
pEPS#29; 235N, p235EN2 and pKYBl; 235C, pEPS#29 and 
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pCYB3; 235N-215C, p235EN2 and pEPS#28; and 235 
complement, pEPS#34 and pEPS#36. 

Figure 13 - Glyphosate Resistance Liquid Assay for 235 
Trans-splicing Constructs. The plasmid constructs were as 
described in Figure 12. The combinations used were: WT, 
pC+E2 and pKYB; 235NC-Dead, pEPS#33 and pEPS#37; 
235NC, p235EN2 and pEPS#29; 235N, p235EN2 and pKYBl; 
235C, pEPS#29 and pCYB3; and 235 complement, pEPS#34 
and pEPS#36. These plasmids were co-transformed into 
ER2799 E. coli cells and plated on LB plates, supplemented 
with 100 pg/mL ampicillin and 50 pg/mL kanamycin; 
pCYB3/pKYB were co-transformed into E. coli ER2744, and 
plated on the LB plate, supplemented as described 
previously. A preculture was prepared for each 
transformation by inoculating the fresh colony into LB medium 
containing 100 pg/mL ampicillin and 50 pg/mL kanamycin at 
30°C for overnight. Equal amounts of pre-culture (10-11 pL 
depending on the cell density) was inoculated into freshly- 
made M9 minimai medium containing 100 pg/ml of ampicillin, 
50 pg/ml of kanamycin and 0.3 mM IPTG in the absence or 
presence of different amounts of glyphosate. The growth of 
each construct was measured by OD at 600 nm. Figure 13A, 
growth at 37°C. Figure 13B, growth at 30°C. 

Figure 14 - Growth of the cis-splicing 235 construct in M9 
liquid minimai media. A plasmid with the full length Ssp DnaE 
intein inserted into position 235 of 5-enolpyruvyl-3- 
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phosphoshikimate synthetase (EPSPS) was constructed. Two 
plasmid vectors were created (pCE235 DnaE and pEPS#31), 
one with a splicing competent Ssp DnaE intein (235 cis) and 
another with a splicing incompetent intein (235 dead). These 
5 piasmids were co-transformed with pKEB12 into ER2799 E coli 

cells and plated on LB plates supplemented with 100 pg/mL 
ampicillin and 50 pg/mL kanamycin. A preculture was 
prepared for each transformation by inoculating the fresh 
colony into LB medium at 30°C for overnight. Equal amounts 
10 of pre-culture (10-llpL depending on the cell density) was 

inoculated into freshly-made M9 minimal medium containing 
100 pg/ml of ampicillin, 50 pg/ml of kanamycin and 0.3 mM 
IPTG. The cell density was determined at various times using 
the OD at 600 nm. 

15 

Figure 15 is a table that shows the sites in the 5- 
enolpyruvyl-3-phosphoshikimate synthetase (EPSPS) protein 
that allow a 5 amino acid insertion and still result in active 
protein. 

20 

Figure 16 is a table that shows the sites in the 5- 
enolpyruvyl-3-phosphoshikimate synthetase (EPSPS) protein 
where a 5 amino acid insertion results in inactive protein. 



25 



Figure 17 is a map of pIH976. Circular double stranded 
DNA with a multiple cloning site. The restriction enzyme sites 
are indicated. Restriction sites with parenthesis are not 
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unique. Ptac represents tac promoter. Origin of replication is 
ori. This plasmid has tetracyclin drug resistant marker (Tetr). 

Figure 18 is a map of pAGR3. Circular double stranded 
5 DNA (SEQ ID NO:76) with a multiple cloning site. The 

restriction enzyme sites are indicated below. Ptac represents 
Tac promoter. Origin of replication is ori. This plasmid has 
ampicillin drug resistant marker (ampr). Lac operator and 
ribosome binding sites are indicated. Plasmid pAGR3 is an 

10 expression vector which includes several elements: (1) a 

synthetic tac promoter coupled to a symmetric synthetic lac 
operator sequence; (2) a lac ribosome binding site; (3) a 
poiylinker for cloning with the ATG within the Ncol site being 
about seven nucleotides downstream of the ribosome binding 

15 site; (4) a copy of the lacl q gene to provide repression of the 

tac promoter; (5) the replication origin from pBR322; (6) 
ampicillin resistance gene; and (7) a four-fold copy of the 
ribosomal transcription terminator upstream of the tac 
promoter. The transcription terminators lower the basal level 

20 of transcription by reducing read-through transcription from 

upstream promoters. 

Figure 19 Trans-splicing of two unrelated gene products 
in E. coll using the Ssp DnaE intern as splice element. 
25 Figure 19A Plasmid pIHaadE-N represents aadA gene (in 

black) fused to the N-terminal splicing domain of the Ssp 
DnaE intein (IN n in grey). Plasmid pAGRE-CsmGFP plasmid 
represents the C-terminal splicing domain of the Ssp DnaE 
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intein (IN c in grey) and smGFP (in black). The calculated 
molecular mass for each of the partners is indicated below in 
kDa. The arrow indicates a trans-splicing event resulting in a 
aadA-smGFP (57 kDa) fusion protein. 

5 

Figure 19B Ampicillin and spectinomycin sulphate 
selection of pIHaadE-N and pAGRE-CsrnGFP plasmid in E.coli 
cells. E cofi were transformed with the plasmids indicated on 
the right side. Colony numbers are indicated on top. 

10 

Figure 19C Expression and detection of hybrid aadA- 
smGFP protein through trans-splicing. Western blot analysis 
of E.coli cell extracts expressing the constructs as indicated 
above the figure, using a monoclonal smGFP specific antibody. 
15 The relative positions of biotinylated MW markers (76, 57, 46, 

37, 28 and 20) are in kDa. The protein bands corresponding 
to aadA-smGFP hybrid as well as IN c -smGFP are indicated. 

Figure 20 is a map of pNCT114/224. Circular double 
20 stranded DNA with a multiple cloning site capable of targeting 

gene/(s) to predetermined locus. The restriction enzyme sites 
are indicated. PpsbA and TpsbA represents photosynthetic 
polypeptide Dl gene promoter and terminator respectively, 
Origin of replication is ori. This piasmid has ampicillin drug 
25 resistant marker (ampr). The homologous recombination 

sequences are indicated as left border (orf228-ssb for 
PNCT114 and 16SrDNA-trnaV for pNCT224) and right boarder 
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(orfl244 for pNCT114 and rps7/12for pIMCT224 ). CS 
represents the cloning sites. 

Figure 21 Plant promoter PpsbA activity in E.cofi and 
5 Trans-splicing of aadA and smGFP. 

Figure 21A Plasmid pll5ag/p225ag represents aadA 
gene (in black) fused to the Ssp DnaE intein N-terminal 
domain (IN n in grey) and the Ssp DnaE intein C-terminal 
10 domain (IN C in grey) fused to smGFP (in black). Both the 

hybrid genes are transcribed in opposite directions. The 
calculated molecular mass for each of the partner is indicated 
below in kDa. Arrow indicates a trans-splicing event resulting 
in a fused aadA-smGFP (57 kDa) protein. 

15 

Figure 21B Ampicillin and spectinomycin sulphate 
selection of pllSag and p225ag plasmid in E.cofi cells. E coli 
were transformed with the plasmids indicated on the right 
side. Colony identities are indicated on top. The digit after the 
20 plasmid is the isolate number. A plus symbol C+") indicates 

the growth of the plasmid with the indicated antibiotics. 

Figure 21C Expression and detection of hybrid aadA- 
smGFP protein through trans-splicing. Western blot analysis 
25 of E. cofi cell extracts expressing the constructs as indicated 

above the figure, using a monoclonal smGFP specific antibody. 
The relative positions of biotinylated MW markers are to the 
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left in kDa. The protein bands corresponding to aad-smGFP 
hybrid as well as IN c -smGFP are indicated. 



Figure 22 Splicing in cis in plant cytoplasm. 5- 
5 enolpyruvyl-3-phosphoshikimate synthetase (EPSPS) and 

acetolactate synthase (ALS) genes are inserted in to the 
binary vector pBI121. The amino and carboxy terminal 
fragments of EPSPS or ALS are indicated in black. The Ssp 
DnaE intein (Intein) gene is flanked on either side by 
10 EPSPS/ALS fragment. Right and left boarder of the 

Agrobacterium is indicated as LB and RB. CaMV 35S promoter, 
NOS promoter (PNOS) and NOS terminator (TNOS) are 
indicated. 



15 Figure 23 Nuclear transfer vector pBITPEC or 

pBITPECsmGFP. This binary vector has the CaMV35S promoter 
driving the rubisco3A transit peptide (TP) that is fused to the 
Ssp DnaE intein C-terminal splicing domain (IN C ). Genes to be 
cloned for organelle transport are indicated after IN C . In case 

20 of pBITPECsmGFP the smGFP gene is cloned in to the multiple 

cloning site. 



25 



Figure 24 is the psbA promoter (PpsbA) sequence (SEQ 
ID NO: 59). 

Figure 25 is the psbA terminator (TpsbA) (SEQ ID 
NO:60). 
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Figure 26 is the Rubisco3 transit peptide (SEQ ID 
N0:61). Nucleotides in lower case represent codon optimized 
units. 

5 Figure 27 is the chloroplast gene targeting vector 

(pNCT114)(SEQ ID NO:62). Features of pNCT114 include: (1) 
vector backbone: pl_ITMUS28; (2) Inserted in BssHIl to BsAA/I 
the left border, (orf228-ssb, 1210 bp) chloroplast genome 
targeting fragment; (3) inserted in ^vrll to Kpnl the right 

10 border, (orfl244, 1550 bp) chloroplast genome targeting 

fragment; and (4) addition of PpsbA and TpsbA between 
Bs/WI and PstI, whereas the other pair is between Avrll and 
Ncol site. 

15 Figure 28 is chloroplast gene targeting vector 

(PNCT224) (SEQ ID NO:63). Features of pNCT114 include: (1) 
vector backbone: pLITMUS28; (2) Inserted in SssHII to Bs/WI 
the left border, (16SrDNA-trnaV, 1680 bp) chloroplast genome 
targeting fragment; (3) inserted in Avrll to Kpnl the right 

20 border, (rps7/12, 1310 bp) chloroplast genome targeting 

fragment; and (4) addition of PpsbA and TpsbA between 
Bs/WI and PstI, whereas the other pair is between Avrll and 
Ncol site. 

25 DETAILED DESCRIPTION OF THE INVENTION 



Protein splicing involves the excision of an intervening 
sequence from a polypeptide with the concomitant joining of 
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the flanking sequences to yield a new polypeptide (Chong, et 
al., J. Biol. Chem., 271:22159-22168 (1996)), as illustrated in 
Figure 1A and IB. The elucidation of the mechanism of 
protein splicing has led to a number of "mtein-based 
5 applications (Comb, et at., U.S. Patent No. 5, 496,714; Comb, 

et al., U.S. Patent No. 5,834,247; Camarero and Muir, 3. Amer. 
Chem. Soc, 121:5597-5598 (1999); Chong, et al., Gene, 
192:271-281 (1997), Chong, et al., Nucleic Adds Res., 
26:5109-5115 (1998); Chong, et al., J. Biol. Chem., 

10 273:10567-10577 (1998); Cotton, et al., J. Am. Chem. Soc, 

121:1100-1101 (1999); Evans, et al., J. Biol. Chem., 
274:18359-18363 (1999); Evans, et al., J. Biol. Chem., 
274:3923-3926 (1999); Evans, et al., Protein Sci., 7:2256- 
2264 (1998); Evans, et al., J. Biol. Chem., 275:9091-9094 

15 (2000); Iwai and Pluckthun, FEBS Lett. 459:166-172 (1999); 

Mathys, etal., Gene, 231:1-13 (1999); Mills, etal., Proc. Natl. 
Acad. Sci. USA 95:3543-3548 (1998); Muir, et al., Proc. Natl. 
Acad. Sci. USA 95:6705-6710 (1998); Otomo, et al., 
Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. 

20 NMR 14:105-114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 

96:13638-13643 (1999); Severinov and Muir, J. Biol. Chem., 
273:16205-16209 (1998); Shingledecker, et al., Gene, 
207:187-195 (1998); Southworth, et al., EMBO J. 17:918-926 
(1998); Southworth, etal., Biotechniques, 27:110-120 (1999); 

25 Wood, et al., Nat. Biotechnol., 17:889-892 (1999); Wu, et al., 

Proc. Natl. Acad. Sci. USA 95:9226-9231 (1998a); Wu, et al., 
Biochim Biophys Acta 1387:422-432 (1998b); Xu, et al., Proc. 
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Nati. Acad. Set. USA 96:388-393 (1999); Yamazaki, et aL, J. 
Am. Chem. Soc. r 120:5591-5592 (1998)). 

Protein splicing in trans has recently been described 
5 both in vivo and in vitro (Shingledecker, et aL, Gene 207:187 

(1998) , Southworth, et aL, EMBO J. 17:918 (1998); Mills, et 
aL, Proc. Natt. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et 
aL, J. Biol. Chem., 273:15887-15890 (1998); Wu, et aL, 
Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et aL, J. 

10 Am. Chem. Soc. 120:5591 (1998), Evans, et aL, J. Biol. Chem. 

275:9091 (2000); Otomo, et aL, Biochemistry 38:16040- 
16044 (1999); Otomo, et aL, J. Biolmol. NMR 14:105-114 

(1999) ; Scott, et aL, Proc. Natl. Acad. Sci. USA 96:13638-13643 
(1999)) and provides the opportunity to express a protein as 

15 two inactive fragments that subsequently can undergo 

ligation to form a functional product (Figure 2). 

Trans- protein splicing also occurs naturally in 
Synechocystis sp PCC6803 (Wu, H., et aL, Proc. Natl. Acad. Sci. 
20 95:9226 (1998)), where it is essential for forming a functional 

DNA polymerase III by joining two fragments of the DnaE 
protein, encoded by two genes separated by 750 kb of 
chromosomal DNA (Figure 3). 

25 These observations led the present inventors to 

investigate whether a functional gene product could be 
generated by splitting the gene of interest into two fragments 
and fusing an intein fragment to each partial target gene. 



00/71701 



-20- 



PCT/USOO/14122 



Expression of the two protein fragments followed by trans- 
splicing, intein mediated complementation, or protein 
complementation would generate an active form of the target 
protein (Figure 4). In this scenario the target gene fragments 
can be located anywhere in the host genome, including being 
widely separated in the nucleus, chioroplast, mitochondria, 
plasmids, bacterial artificial chromosomes, yeast artificial 
chromosomes, or any combination of these. Furthermore, by 
placing the gene fragments into different organelles or 
plasmids, such as one half in the nucleus and the other half in 
the chloroplast or mitochondria of a plant, the transfer of both 
gene halves, needed to reconstitute the fully active target 
protein, for example, to a distant relative by pollination or by 
horizontal gene transfer via a bacterial, fungal, or viral vector 
would be virtually eliminated. This would greatly reduce and 
possibly eliminate the risk of the spread of a transgene 
outside of its relevant environment. 

Two examples of splitting a target gene and 
reconstituting activity using a protein splicing element are 
described below. The two genes investigated were mutant 
forms of the acetoiactate synthase (ALS) gene from 
Escherichia coli and the 5-enolpyruvyl-3-phosphoshikimate 
synthetase (EPSPS) gene from Salmonella typhimurium, which 
confer resistance to the sulfonylurea and glyphosate 
herbicides, respectively. Both enzymes are involved in the 
biosynthesis of protein building blocks. ALS is the first 
common enzyme in the biosynthesis of branched-chain amino 
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adds (LaRossa and Schloss, J, Biol. Chern., 259:8753-8757 
(1984); Chaieff and Ray, Science, 223:1148-1151 (1984); 
Faico and Dumas, Genetics, 109:21-35 (1985)) while EPSPS is 
required in the synthesis of aromatic amino acids (Stalker, et 
5 ai., J. Biol. Chem. 260:4724-4728 (1985)). Inhibition of these 

enzymes by chemical compounds can lead to the death of the 
organism. 



The commonly used sulfonylurea herbicides (SU), such 
10 as sulfometuron methyl (SM) (Short and Colburn, Toxicol Ind. 

Health, 15:240-275 (1999)), block the growth of bacteria, 
yeast and higher plants by inhibiting acetolactate synthase 
(ALS) (EC 4.1.3.18). In order to generate herbicide resistant 
plants, there was a great effort in identifying a mutant ALS 
15 gene which permits growth in the presence of SM. The 

mutations which render bacteria and yeast resistant to SM 
were the first to be reported (Hill, et a!., Biochem. J., 335:653- 
661 (1998)). Subsequently, similar point mutations were 
confirmed in the ALS genes isolated from naturally occurring 
20 resistant crops, corn, cocklebur and tobacco (Lee, et al., EMBO 

J., 7:1241-1248 (1988); Bernasconi et al., J. BioL Chem. f 
270:17381-17385 (1995)). Some of these SU tolerant crops, 
such as corn ICI8532 IT and Pioneer 3180 IR have been 
commercialized. 



In Example I below, the herbicide resistant gene was 
split and an intein fragment fused in-frame to each partial 
gene. The split gene was determined to confer resistance to 
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the herbicide SM in E coli. E. coli was used as a model system 
since it contains the active ALSI and acetolactate synthase III 
(ALSIII) enzymes, but not an active ALSII. ALSI and ALSIII are 
the two isoforms of ALS genes in E. coli which are crucial for 
5 the synthesis of valine, isoleucine and leucine (DeFelice, et al., 

Ann. Microbiol. (Paris) 133A:251-256 (1982)). Their activity is 
sensitive to valine feedback inhibition. Therefore, by 
saturating the growth medium with valine, ALSI and III will be 
inhibited and the cells will stop growing. By introducing a 

10 recombinant ALSII into E. coli cells, their growth will be 

rescued since ALSII is resistant to valine inhibition. This 
feature makes E. coli strain ER2744 a good in vivo model 
system for investigating the activity of the E. coli ALSII gene 
genetically modified by a linker insertion or a trans-splicing 

15 intein element. 



The second herbicide resistant gene tested was the 
aroA gene from Salmonella typhimurium that has a C301 to T 
mutation (Stalker, et al., J. BioL Chem. 260:4724 (1985)). This 

20 encodes the 5-enolpyruvyl-3-phosphoshikimate synthetase 

(EPSPS) (EC 2.5.1.19) protein with a ProlOl to Ser change 
and is known to confer resistance to the herbicide glyphosate 
(commonly marketed as Round-Up®), In this embodiment, an 
N-terminal fragment of the EPSPS gene was fused to the N- 

25 terminal splicing domain of the Ssp DnaE intein and the C- 

terminal fragment of the EPSPS gene was fused to the O 
terminal splicing domain of the Ssp DnaE intein. In order to 
determine the sites in the EPSPS protein that would tolerate 
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the insertion of an intein, a linker scanning experiment was 
performed (Biery, et aL, Nucleic Acids Res., 28:1067-1077 
(2000)) (GPS®-LS from New England Biolabs, Inc., Beverly, MA) 
that randomly inserted 5 amino acids throughout the protein 

5 sequence. Inteins were inserted into those sites found to be 

tolerant of amino acid insertion. Trans-spl icing constructs 
were then created that placed the gene fusion encoding the 
N-terminal fragment of EPSPS fused to the N-terminal domain 
of the Ssp DnaE intein on one plasmid and the C-terminal 

10 portion of EPSPS fused to the C-terminal splicing domain of 

the Ssp DnaE intein on another plasmid. For example the 
EPSPS protein could be split at the site corresponding to 
Gly235. The two plasmids were co- transformed into E. coli 
cells which lacked a functional EPSPS protein and cell growth 

15 on M9 minimal media in the presence or absence of the 

herbicide glyphosate was observed. 

The activity of both the split ALS and the split EPSPS 
herbicide resistant genes were observed whether the intein 

20 was unmodified or had its catalytic residues changed, thus 

eliminating trans-splicing activity. This indicated that although 
splicing would generate a covalently attached protein 
product, it is not necessary to do so in every situation. The 
intein in this manifestation would work as an affinity domain 

25 to bring the two protein fragments together and in the correct 

orientation. In these experiments the presence of the intein 
was absolutely required for activity of the split proteins. This 
is based on the observation that both the split ALS and 
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EPSPS genes without an intein fusion were not able to allow 
E. coli growth on the appropriate herbicides. 

In one embodiment of the invention, two gene 
fragments, fused to an intein splicing domain, are introduced 
independently into nuclear chromosomes, using selectable 
markers such as resistance to an antibiotic or other growth 
inhibitors to verify gene transfer. Independent transfer of the 
two fusion genes will assure a remote location on the plant 
genome, probably on separate chromosomes, thus excluding 
the possibility that both genes could be acquired by a single 
virus or plasmid vector for transfer to other organisms. If so 
desired, the remote location of the two genes can be assured 
by targeting to specific sites by homologous recombination 
with known DNA sequences. 

In another embodiment, one of the two fusion proteins 
is transformed into the cell nucleus and the other into 
chloroplasts, so as to eliminate virtually any chance of gene 
transfer to related plants by any conceivable mechanism, 
including cross-pollination of related species, since only 
inactive fragments of the gene would be present in the 
pollen. The gene fragments in chloroplast are maternally 
transmitted and cannot be transmitted through pollen. The 
same consideration would apply to gene fragments 
expressed in mitochondria. 
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This technology may also be applied to non-plant 
systems. By way of example, a transgene to be 
compartmentalized could be split and an intein fused to the 
gene fragments. In the case of bacteria, the split genes are 

5 preferably placed far apart on the bacterial chromosome using 

standard chromosomal transformation techniques. As a 
further control measure the gene segments may also be 
arranged in opposite orientations. Another manifestation of 
this method is to split a target transgene in two and fuse to 

10 the appropriate intein domains prior to insertion of the split 

gene into a eukaryotic cell to prevent the transgene's activity 
from being spread to the environment or neighboring cells. 
The split gene is also placed far apart on the eucaryotic 
chromosome or placed on separate chromosomes. 

15 Furthermore, the gene fragments may be located in separate 

organelles such as the nucleus and mitochondria. The gene 
fragment in mitochondria is maternally transmitted. 

One application of the present invention is in preventing 
20 the spread of complete transgenes to the environment from 

transgenic plants. This is accomplished by splitting the 
transgene fusions into two or more fragments and fusing 
these to intein fragments. The partial transgene fusions are 
located in separate compartments, such as one portion in the 
25 nuclear DNA and the second portion in the chloroplast DNA. 

Following expression of the partial genes, the protein 
fragments are directed to the site of activity where they 
associate to reconstitute the target protein activity. Only the 
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transgene fragment present in the nucleus is spread through 
pollen since the chloroplast DNA is passed to the next 
generation oniy maternally. This will vastly reduce the spread 
of the complete transgene to the environment. 

5 

Another advantage of the present invention is that the 
host cells expressing only one inactive fusion protein species 
of a protein can be handled safely, thereby reducing the risk 
of exposing humans and the environment to the target 

10 protein, which may be a toxin, etc. Also, splitting a target 

gene into two separate loci greatly reduces the chance of 
transferring the entire protein coding sequence into other 
organisms through DNA carriers (plasmid, virus, cosmid, etc.) 
or other means (cell fusion, etc.). One hypothetical case is to 

15 express a toxic gene, for example the diphtheria toxin. The 

diphtheria toxin protein is an extremely toxic protein to 
human and animal cells and needs to be handled extremely 
carefully. This protein has been tested in preclinical and 
clinical phase I trials for use as a drug to eradicate tumor cells 

20 (Kelley, Proc. Natl. Acad. Sci. USA 85(11) :3980-3984 (1988); 

Alexander, Neuron 3(1): 133-139 (1989); Maxwell, et al., 
Cancer Res. 51(16)4299-4304 (1991); Madshus, J. Biol. Chem., 
269(26): 17723-17729 (1994); Murphy and vanderSpeck, 
Semin Cancer Biol. 6(5):259-267 (1995); Rozemuller and 

25 Rombouts, Leukemia, 12(5):710-717 (1998); Veggeberg, Mol. 

Med. Today 4(3):93 (1998); Kreitman, Current Opin. Immunol., 
ll(5):570-578 (1999); Vallera, et al., Protein Eng. 12(9):779- 
785 (1999)). Therefore it would be advantageous to split the 
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diphtheria toxin gene into two intein fusion DNA segments 
and express them in two different bacteria or yeast strains. 
The two fusion proteins can be mixed, when it is needed, to 
assemble the toxin. 

5 

Thirdly, by compartmentalizing at least one of the 
fragments of the target gene into an organelle that is subject 
to maternal inheritance (e.g., chloroplasts or mitochondria), 
the genetic transfer of the functional gene to related 
10 organisms through processes such as cross-pollination can be 

avoided. 



The invention described may also be utilized as a means 
for expressing any gene of interest in transgenic animals. 

15 Transgenic animal models have been widely used as a 

scientific tool to conduct biomedical studies or to produce 
desired proteins. Transgenic mice and other transgenic 
animals, such as transgenic fish, frog, rat, cow, pig, etc. have 
been shown to express human genes (or a foreign gene) for 

20 research and commercial purposes, such as production of a 

vaccine or therapeutic agent, or used as an animal model for 
human disease (Alexander, Neuron 3(1): 133-139 (1989); 
Groner, et al., 3. Physiol. 84(l):53-77 (1990); Patil, et al., 
Neuron 4(3): 437-447 (1990); Aloe, et al., Growth Factors 

25 9(2):149-155 (1993); Aguzzi, et al., Brain Pathol. 4(1)3-20 

(1994); Groner, et al., Biomed. Pharmacother. 48(5-6) 231-240 
(1994); Schorderet, Experientia 51(2):99-105 (1995)). One of 
the concerns is that the transgenic animal may acquire an 
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undesired foreign gene and pass it on to the next generation 
and thereafter. This would result in genetically altered animal 
strains, which may have unforeseen social and ethical 
consequences. In accordance with the present invention, 

5 such a transgene can be split into two inactive fusion DNA 

fragments. One of them could be genetically integrated into 
an animai genome and the other fragment could be supplied 
by a DNA carrier (such as virus, etc.) which cannot be 
incorporated into the genome. Therefore, when one fusion 

10 protein from the animal and the other from the DNA carrier co- 

express, the fusion proteins will reassemble, trans-splice and 
produce an active protein. This gene arrangement can 
prevent animals from acquiring an intact foreign gene, 
thereby avoiding genetic contamination. 

15 

The compartmentalization of two gene fragments is an 
extension of trans-spiicing. The protein in question is divided 
into fragments and the appropriate split genes separated 
onto the same or different DNA molecules. For example, the 

20 genes for the two halves of the DnaE protein from 

Synechocystis sp PCC 6803 with the Ssp DnaE or Ssp DnaB 
intein splicing domains fused to the appropriate fragments 
(Wu, et al., Proc. Natl. Acad. ScL USA, 95:9226-9231 (1998a); 
Wu, et al., Biochim Biophys Acta 1387:422-432 (1998b)) could 

25 be divided so that one half is in the nucleus and the second 

half is in the mitochonria of a specified organism. 
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In carrying out the present invention, one must employ 
one or more of the following methods: 

(1) identifying a suitable split site on the target 
5 transgene; 

(2) the methodology for splitting the gene into two or 
more fragments and fusing each fragment to a split intein; 

(3) the methodology for successfully generating the 
split gene product into a functional enzyme or protein; 

10 (4) the methodology for screening the host cell for 

active gene product or organism; 

(5) location of split gene sequences in the relevant 
cellular compartment; 

(6) a method of splitting the target gene into more 
15 than two fragments; 

(7) use of protein complementation to present 
transgene spread; and 

(8) introduction of the transgene. 

20 ( 1 ) A method for identification of a suitable split site 

on any transgene 

One preferred method for identifying a split site on the 
transgene is based on the structural analysis of the protein of 
25 interest or its analogs and by sequence homology. This 

approach involves studying the known biochemical and X-ray, 
NMR or related structural information in order to determine a 
preferable intein insertion site and/or sites to divide the 
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protein into fragments. In particular, one should determine 
which are the pertinent reactive amino acid residues and their 
spacing and spatial arrangement within the protein. If 
possible, it may be ideal to split the target gene so that 
5 catalytic amino acids are distributed onto each fragment. This 

will increase the likelihood that neither fragment will have 
activity alone. The protein split site may be anywhere in the 
protein, but initial sites for testing should be loops or linkers 
present between secondary motifs such as beta sheets or 

10 alpha helices. The first loops chosen should not be part of 

the catalytic site, although the eventual split site may be 
located there. As a first trial, the preferred split site would be 
a loop or linker region between two folding domains within a 
protein. This increases the possibility that the protein 

15 fragments will foid properly when expressed separately. 

If no biochemical or structural information is available for 
the protein of interest, then the alignment of similar protein 
sequences from different organisms or of similar protein 

20 sequences from the same organism may be informative. The 

protein alignment could be by sequence comparison by 
traditional methods or using any of a variety of computer 
programs such as GCG (Genetics Computer Groups, Madison, 
WI.) Regions of high conservation between similar proteins in 

25 all likelihood represent areas of general importance and 

splitting the protein in a region of high conservation should be 
reserved for later testing. Instead one should determine 
regions of low conservation, preferably regions that also vary 
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in amino acid number that lie between regions of high 
conservation. The low conservation indicates that there is a 
low probability of a catalytic residue being present and the 
variation in amino acid residue length indicates that the exact 
5 spacing between the conserved domains may not be dictated 

by this stretch of amino acids. These properties would be 
advantageous for a site of intein insertion and splitting a 
target protein. 



10 Also, when choosing the site to insert an intein in the 

protein of interest one should test sites that possess amino 
acid residues favorable for the splicing activity of the intein 
being tested. Preferably a site in the target protein that was 
similar or identical to the naturally occurring extein residues of 

15 the intein under investigation could be chosen. Alternatively, 

residues known to facilitate proficient splicing may be inserted 
together with the intein. In this case, following the splicing 
reaction these residues would be present in the sequence of 
the spliced product and may alter the activity of the target 

20 protein. The effect of these extra residues on the target 

protein should be tested by inserting the extra amino acids 
into the target protein and checking for the desired property 
or activity. 

25 Another preferred method is based on systematic 

scanning of a protein of interest by random linker insertion. 
Linker scanning can be performed by many methods (Gustin, 
et al. Methods MoL Biol. 130:85-90 (2000); Hobson, et al. 
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Methods MoL Biol. 57:279-285 (1996); Biery, Nucleic Acids Res. 
28:1067-1077 (2000)). This protocol generates a library of 
genes with extra stretches of DNA randomly inserted 
throughout. When this library is translated it produces a set 
5 of proteins with extra amino acid residue(s) inserted in 

different positions. The library is then screened for the 
desired property of the target protein. For example, if the 
target protein confers resistance to an herbicide then the 
library is screened to determine which of the proteins with the 

10 extra amino acid residues can allow growth of the target 

organism in the presence of an herbicide. A list of sites in a 
protein that can tolerate extra amino acids is created. If 
structural or biochemical information is available, this list can 
be compared with the known information. An ideal case 

15 would involve choosing a split site that tolerates the extra 

amino acid insertion and is present in a linker or loop region 
and results in catalytic residues being located on different 
fragments. If no structural information is available then one 
would preferably begin by splitting the gene at the tolerant 

20 site closest to the middle of the target protein and continue 

testing split sites outward from there until the desired activity 
can be reconstituted. In both methods a preferred insertion 
site would also posses the native extein sequence for the 
intein being used, although this is not required. The fusion 

25 proteins may have optimized amino acid residues at the splice 

junctions that allow for a functional product to be assessed. 
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(2) A method for splitting a gene and fusing each 
gene fragment in-frame to a split intein coding 
sequence 

5 Once a site to split a gene of interest has been 

determined (see above), then the target gene is split into 
two or more fragments using common genetic techniques 
(Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2 nd 
Edition, Cold Spring Harbor Laboratory, NY: Cold Spring 

10 Harbor Laboratory Press (1989)). For example, PCR primers, 

with appropriate restriction sites, may be designed so that 
one corresponded to the start of the target gene and the 
other to the sequence at the split site. Another set of PCR 
primers may be designed that correspond to the split site and 

15 the other end of the target gene. The two target gene 

fragments are then amplified by PCR (Sambrook, et al., supra) 
and cloned into a plasmid vector with the same unique cloning 
sites present in the PCR primers. Once cloned into separate 
vectors, intein fragments would be fused to the target genes. 

20 In one method, the C-terminal end of DNA coding for an N- 

terminal portion of the target protein would be fused to the 
N-terminal end of the DNA coding for an N-terminal portion of 
the intein, and -in a separate fusion- the N-terminal end of 
DNA coding for a C-terminal portion of the target protein 

25 would be fused to the C-termina! end of DNA coding for a C- 

terminal portion of the intein. 



These gene fragment fusions are then transferred to 
the same or separate expression vectors and transformed 
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into bacterial or eucaryotic cells, existing as single or 
multicellular organisms, to screen for the desired activity of 
the target protein. It should be noted that the gene 
fragments in question could be cloned using restriction sites 
5 within or external to the intein gene present either naturally 

or added by mutation. Also, recombination sites may be used 
instead of restriction enzyme sites for the movement of the 
gene by recombination. The gene or gene fragments may 
then be transferred and/or expressed from a plasmid vector, 

10 a viral genome or the genome of a bacterial, eucaryotic, or 

archeal organism. One preferred method is to utilize a 
naturally occurring trans-splicing intein, for example the intein 
from the dnaE gene of Synechocystis species PCC6803 (Wu, et 
al., Proc. Natl. Acad. So. USA 95:9226-9231 (1998)). However, 

15 any of the known inteins could be used (See InBase at 

http://www.neb.com/neb/frame_tech.html; Perler, et al., 
Nucleic Acids Res. f 28:344-345 (2000)). This would involve 
splitting the full length intein in order to generate the desired 
affinity or trans-spl icing domains. One method would be to 

20 split the full length intein in the linker region between the 

blocks B and F of the protein splicing domains (Petrokovski, 
Protein Sci. 7:64-71 (1998); Perler, et al., Nucleic Acids Res. 
25:1087-1093 (1997); Perler, et al., Nucleic Acids Res. f 
28:344-345 (2000)). 



25 
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(3) Creating a functional protein from expressed 
split fragments 

The next step is to use an intein as an affinity domain to 
facilitate complementation and reconstitution of the N- and C- 
terminal halves of a protein into a functional enzyme. The 
sites to determine protein splitting would be as described in 
(1) above and the cloning of the target gene fragments and 
the addition of the intein domains as described in (2). In this 
case the intein fragments need not cause splicing of the two 
protein fragments to reconstitute enzyme activity. In one 
preferred embodiment, the intein domains would be mutated 
to abolish the possibility of splicing activity and would act only 
as a facilitator of protein complementation. The intein splicing 
activity could be abolished by mutating the amino acid 
residues involved in the splicing reaction (Xu, et aL, EMBO J. 
15:5146-5153 (1996); Chong, et aL, J. Biol. Chem. 271:22159- 
22168 (1996); Chong, et al., Biochem. Biophys Res. Commun., 
259:136-140 (1999); Chong, et al., Gene, 192:271-281 

(1997) ; Chong, et al., Nucleic Acids Res., 26:5109-5115 

(1998) ; Chong, et al., J. Biol. Chem., 273:10567-10577 

(1998) ; Chong and Xu, J. Biol. Chem., 272:15587-15590 
(1997); Evans, et al., J. Biol Chem., 274:18359-18363 (1999); 
Evans, et aL, J. Biol. Chem., 274:3923-3926 (1999), Evans, et 
al., Protein Sci., 7:2256-2264 (1998); Evans, eta I., J. Biol. 
Chem., 275:9091-9094 (2000); Mathys, et aL, Gene, 231:1-13 

(1999) ; Paulus, Chem. Soc. Rev., 27:375-386 (1998); Perler, 
et al., Nucleic Acids Res., 25:1087-1093 (1997); Pietrokovski, 
et al., Protein Sci., 3:2340-2350 (1994); Pietrokovski, et aL, 
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Protein So., 7:64-71 (1998), Scott, Proc. Natl. Acad. ScL USA, 
96:13638-13648 (1999), Shingiedecker, et al., Arch Biochem. 
Biophys. 375:138-144 (2000); Southworth, et al., 
Biotechniques 27:110-120 (1999); Telenti, et al., J. Bacterid., 
5 179:6378-6382 (1997); Wood, et al., Nat Biotechnof., 17:889- 

892 (1999); Wu, et al., Biochim Biophys Acta 1387:422-432 
(1998b); Wu, et al., Proc. Natl. Acad. ScL USA 95:9226-9231 
(1998a)). 

10 In another embodiment the intein affinity domain could 

retain its normal catalytic residues. Furthermore, the intein 
may be comprised of a deletion or mutant form such that it is 
significantly smaller or larger or contains non-native amino 
acid residues when compared to its original primary 

15 sequence. The deletion forms of the intein could be created 

by sequentially decreasing the size of the intein either at the 
gene level or proteolyticaliy and then testing for affinity 
activity. The affinity activity could be tested by using the split 
herbicide resistant gene and fusing the new deletion mutant 

20 to the appropriate herbicide resistant gene fragments and 

looking for growth on the herbicide in question. Mutants of 
the intein fragment could be formed by error prone PCR, linker 
scanning, site directed mutagenesis, or by mutagenic 
compounds and the activity of the intein fragments tested as 

25 described above. Note the herbicide resistance gene could 

be substituted by a drug resistance gene, green fluorescent 
protein or any selectable marker The affinity of the intein 
fragments could also be tested by immobilizing one fragment 
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on a solid support and testing for the binding of the second 
fragment to the first fragment. 

(4) A method of screening for constructs producing 
active proteins of interest in a suitable host cell or 
organism 

The screen for the target gene activity will vary with the 
target gene but could be by in vitro assay following 
expression and purification or in a crude cell lysate or in vivo 
by determining protein activity by cell phenotype, such as 
viability, morphology, sensitivity, or insensitivity to a drug or 
compound, appearance, or ability to bind or not bind a specific 
molecule or compound. One preferred method is to use E. coii 
as host cells to test, for example, herbicide resistant activity 
of the re-assembled product of a split gene. The E. coii cells 
must be sensitive to the herbicide in question. The target 
gene fragments, with the intein fusion, is present on a 
plasmid or plasmids and is transformed into E. coii cells using 
standard techniques. 

The gene fusions are expressed either constitutively or 
by an inducible promoter. E. coii are then tested for growth 
under selection conditions, i.e. in the presence of herbicide, in 
both the presence or absence of the appropriate gene 
fragments. Growth in the presence of the gene fragments 
indicates the reconstitution of the target protein activity. The 
E. coii cells could be substituted with any bacterial, archaea, 
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or eucaryotic cell type (either single or multicellular) as well as 
a virus by employing techniques well known in the art. 

Furthermore, both of the target gene fragments could 
be present in the genome of the organism, or one fragment 
could be present in the genome and the other in a plasmid or 
some other vector. The target protein fragments could be 
expressed in one organism together or separately and added 
to another cell type for assay. The fusion could be tested 
directly in plant cells or other multicellular organisms by 
placing the transgene fragments in the host organisms 
nuclear, chloroplast, or mitochondrial genome and 
determining if the desired activity is present. The target gene 
or protein fragments could be delivered by a bacterial, fungal, 
viral, miceliar, mechanical (biolistic) or similar vector to the cell 
type or organism to be tested. 

( 5 ) Location of split genes 

The present invention also comprises location of the 
split target gene sequences in different cellular compartments, 
different locations on the chromosome, or different vectors. 
One preferred method is to position the two split gene 
sequences in the nucleus, chloroplast, mitochondria, bacterial 
artificial chromosome, yeast artificial chromosome, plasmid, 
preferably not both in any one of the aforementioned. 
Location of fragments can be accomplished in accordance with 
standard molecular biology techniques. In order to 
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reconstitute the gene product from its fragments, the 
appropriate gene fragments must be fused to a 
targeting/localization sequence so that their protein products 
are transported into a cellular compartment (e.g., the 
5 chloroplasts) where functional reconstitution can occur. 

(6) A method of splitting the target gene into two or 
more fragments 

10 The present invention also embodies methods for 

splitting the target gene into two or more fragments and 
reconstituting the desired activity by trans-splicing, intein 
mediated complementation or protein complementation of all 
the necessary fragments. For example, inteins with differing 

15 affinities could be attached to the target protein fragments so 

that they reassemble the active protein, in a manner 
described previously (Otomo, et aL, Biochemistry, 38:16040- 
16044 (1999); Otomo, et aL, J. Biomoi. NMR, 14:105-114 
(1999)). In this case each fragment could be located far apart 

20 in the chromosome, on a separate chromosome or in multiple 

locations as described above, except that the number of 
locations could match the number of fragments the protein 
was divided into. 

25 (7) Use protein complementation in the prevention 

of transgene spread 

This protocol uses the natural complementation activity 
of two protein fragments to reconstitute the desired protein 
30 property. The two genes encoding the protein halves may be 
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located in the nucleus, chloropiast, mitochondria, bacterial 
artificial chromosome, yeast artificial chromosome, piasmid or 
any combination of those organelles or vectors. Following 
expression, both protein fragments may be targeted to the 
site of protein action and the desired protein property 
generated by complementation of the protein fragments. 
Protein complementation has been reported previously (Rossi, 
et al., Trends Cell Biol. 10:119-122 (2000)) and so makes a 
viable alternative to using an intein as a complementation 
domain. The procedures necessary to carry out this 
experiment are similar to what has already been discussed 
except no intein fusion is used. A site to split a target gene is 
determined as described in (1). The transgene fragments are 
cloned as described in (2), except that an intein is not used 
as a fusion partner. The screening for activity of the split 
protein is conducted as described in (4). 

(8) Introducing a transgene into an organism by 
viral infection 

In yet another embodiment, the two transgene 
fragments, either intein fusions or not, may be packaged into 
separate viral particles. These viruses co-infect an organism 
and both transgenes are expressed. The desired protein 
property is generated following protein splicing, intein 
mediated complementation, or protein complementation. One 
preferred method comprises choosing the split site, clone the 
fragments and check for activity in trans as described in (1), 
(2), and (4). The appropriately split transgene or transgene- 
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intein fusions are packaged into adenovirus. The 
adenoviruses containing the appropriate transgenes can be 
introduced into a subject organism and upon transfection 
introduce the two gene fragments so that the target protein 
5 activity can be expressed. 

BRIEF DESCRIPTION OF THE EXAMPLES 

In Example I, we demonstrate a method of splitting a 

10 herbicide resistant gene by an intein. We show how to select 

potential split sites in the E. coli herbicide resistant gene 
encoding for acetolactate synthase (ALS) based on the 
sequence homology analysis and the crystal structure of the 
protein of interest or its analog. The DNA fragment encoding 

15 for the N-terminal 327 amino acid residues of the ALS protein 

was fused in frame to the N-terminal 123 amino acids of the 
Ssp DnaE intein while the DNA fragment encoding for the O 
terminal 221 amino acid residues was fused in frame to the C- 
terminal 36 amino acids of the Ssp DnaE intein. A plasmid 

20 vector bearing one of the fusion genes was expressed as an 

inactive ALS protein fragment. When both fusion gene 
vectors were introduced into the same host cell and co- 
expressed, the two inactive fusion proteins underwent trans- 
splicing to produce a functional enzyme in vivo, conferring 

25 herbicide resistance to the E. coli host cells. This approach 

may be applied to selection of suitable sites in any gene for 
fusion to an intein sequence. 
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In Example II, we demonstrate how to choose a split 
site in the maize ALS gene based on the sequence homology 
of the maize ALS gene and its E. coli counterpart, ALSII gene. 
The DNA encoding the N-terminal 397 amino acid residues of 
the maize ALS gene was fused in-frame to the DNA sequence 
encoding the N-terminal 123 amino acids of the Ssp DnaE 
intein while the DNA fragment encoding the C-terminal 241 
amino acid residues was fused in frame to the DNA encoding 
the C-terminal 36 amino acids of the Ssp DnaE intein. We 
show that, when the two fusion genes were co-expressed, 
the two fusion proteins underwent trans-splicing to produce a 
protein product of expected size for the mature protein. 

In Example III, we demonstrate a method of identifying 
potential split sites in a mutant S. typhimurium aroA gene 
encoding 5-enolpyruvyl-3-phosphoshikimate synthetase 
(EPSPS) based on transposon random linker insertion. Two 
sites at amino acid positions 215 and 235 of EPSPS among all 
42 potential sites were chosen to split the EPSPS gene. The 
DNA fragment encoding the N-terminal 215 or 235 amino acid 
residues of the EPSPS protein was fused in-frame to the N- 
terminal 123 amino acids of the Ssp DnaE intein while the DNA 
fragment encoding the C-terminal 212 or 192 amino acid 
residues of EPSPS was fused in-frame to the DNA encoding 
the C-terminal 36 amino acids of the Ssp DnaE intein. When 
only introducing half of the EPSPS gene with or without the 
intein fused and the two complement halves without intein 
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into ER2799, the EPSPS was expressed as a non-functional 
protein. However, when introducing both the halves of EPSPS 
fused with both active or inactive intein halves into ER2799, 
the EPSPS was expressed as a functional protein and confers 
5 resistance to the herbicide glyphosate indicating that the isl- 

and C-termina! halves of the Ssp DnaE intein facilitate the 
complementation and reconstitution of the N- and C-terminal 
halves of the EPSPS protein by bringing the EPSPS halves in 
close proximity. 

10 

In Example IV, we describe a method in which two 
unrelated gene products such as aminogIycoside-3- 
acety I transferase (enzyme responsible for metabolism of drug 
spectinomycin or streptomycin) and Aequorea victoria soluble 

15 modified green fluorescent protein could be trans-spliced to 

one hybrid protein in E.coli cell. Both the genes are located on 
two different plasmids with respective trans-splicing elements 
from Ssp DnaE intein. The plasmids have two independent 
mechanisms of expression. This hybrid protein confers 

20 resistance to spectinomycin sulphate. 

In Example V, we describe a method in which two 
unrelated genes, such as aadA (encodes for aminoglycoside- 
3-acetyltransferase) and smGFP (soluble modified green 
25 fluorescent protein), could be located on a single £.co//-piant 

binary vector under the transcriptional and translational 
control by a chloroplast promoter (PpsbA). Both the genes 
when expressed are capable of producing a hybrid 
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aminogiycoside-3-acetyltransferase- soluble modified green 
fluorescent protein. Thus this method allows for rapid trans- 
splicing screening of protein/protein fragments before 
introducing to the plant cells using promoter that could be 
5 recognized both by E.cofi and plant cellular machinary. 



In Example VI, we describe a method in which a cis- 
splicing construct containing two fragments of either 5- 
enolpyruvyi-3-phosphoshikimate synthetase (EPSPS) or 

10 acetolactate synthase (ALS) genes along with a Ssp DnaE 

intein is capable of splicing into a mature protein in plant 
cytoplasm. This experiment will enforce the idea of cis/trans- 
spiicing in the cytoplasm. This technique would be useful for 
proteins, which need specific modification for activity/folding in 

15 cytoplasmic environment. A part of the target protein gene 

with necessary transport signal and splicing element will be 
placed in an organelle for cytoplasmic transport in the form of 
a pre-cursor polypeptide. 



20 In Example VII, Section 1, we describe a method in 

which two unrelated genes, such as aadA (encodes for 
aminoglycoside-3-acetyltransferase) and smGFP (soluble 
modified green fluorescent protein), could be located on the 
chloroplast genome and produce a hybrid protein via protein 

25 trans-splicing. Success in this method will lead to 

compartmentalization of protein/protein fragments and trans- 
splicing of the functional protein. Also transformation of 
several separated genes in one vector to form a 
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multifunctionai protein simplifying engineering of novel 
characters. 

In Example VII, Section 2, we describe a method in 
which two unrelated genes/gene fragments could be localized 
in two different compartments in plant cell, such as 
chloroplast and nucleus and express the respective 
protein/polypeptide. The nuclear encoded component is 
tripartite with a chloroplast transit peptide which will help the 
protein fragment to be synthesized in cytoplasm and migrate 
in to the chloroplast for the trans-splicing event to occur. The 
chloroplast half will be as an integrated component in the 
circular genome of the organelle. The resulting plants will not 
be able to transfer the novel character of the newly 
introduced transgene to any closely related species. 

The present invention is further illustrated by the 
following Examples. These Examples are provided to aid in 
the understanding of the present invention and are not 
construed as a limitation thereof. 

The references cited above and below are hereby 
incorporated by reference. 
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EXAMPLE I 

Production of Functional Herbicide-resistant Acetoiactate 
Synthase in B. coli by Protein Trans-splicing 

5 

In this Example we demonstrate a method to split the 
gene which encodes E. coli acetoiactate synthase II (ALSII; 
EC 4.1.3.18; acetohydroxyacid synthase), possessing a 
herbicide-resistant mutation (Yadav et al, Proc. Natl. Acad. Sci. 

10 USA, 83:4418-4422 (1986); Hill et al., Biochem. J., 335:653- 

661 (1998)), by fusion with Ssp DnaE intein coding sequences 
(Evans etal,J. Biol. Chem. 275:9091-9094 (2000); Scott, et 
al., pro. Natl. Acad. Sci. USA, 96:13638-13643 (1999)). We 
were able to reconstitute a functionally active ALSII enzyme 

15 through protein trans-splicing in the bacterium E. coli ER2744 

(fhuA2 glnV44 el4- rfbDl? relAl? endAl spoTl? thi-1 A(mcrC- 
mrr)114::IS10 lacZ::T7 genel) (Figure 5). First, we show how 
to select a potential split site in the acetoiactate synthase II 
gene based on the analysis of its sequence and structure 

20 homology. Then we show how to design and carry out 

experiments to analyze the protein trans-splicing activity of 
the split ALS protein and how to assay the enzymatic activity 
of reconstituted ALS. We demonstrate that the two portions 
of the ALS fusion protein, produced from two separate 

25 plasmid vectors, undergo trans-splicing to produce a protein 

product of expected size for the mature protein. Furthermore, 
co-expression of the split ALS gene fragments conferred 
resistance to a herbicide in the E. coli ER2744. This method 
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may be applied to the production of any protein of interest 
utilizing trans-splicing inteins. 

1. Cloning of wild-type E. coli ALSII and generation 
5 of its herbicide resistant mutant 

The initial step is to clone the wild type ALSII and to 
create a herbicide resistant ALSII mutant carrying Alanine26 
to Valine substitution (Yadav et al, Proc. Natl. Acad. Sci. USA, 

10 83:4418-4422 (1986); Hill eta/., Biochem. J., 335:653-661 

(1998)). E. coli strain MI162, containing an enzymatic active 
copy ALSII, was obtained from CGSC, E. coli Genetic Stock 
Center (Yale University, New Haven, CT). Genomic DNA was 
extracted from E. coli strain MI162 using QIAarnp Tissue Kit 

15 (Qiagen, Inc., Studio City, CA). DNA Polymerase Chain 

Reaction (PCR) was performed on the E. coli DNA sample to 
clone the full length ALSII using primers S'-GGACGGGGAACTAA 
CTATG-3' (SEQ ID NO:l) and 5'-CCACGATGACGCACCACGCG-3' 
(SEQ ID NO:2) and Vent® DNA Polymerase (New England 

20 Biolabs, Beverly, MA). The ALSII coding sequence was further 

amplified using primers 5' GGAGGGGGCATATGAATGGCGCACAGT 
GGG-3' (SEQ ID NO:3) and S'-GGGGGGTCATGATAATTTCTCCAAC 
-3' (SEQ ID NO:4) and cloned into Ndel and Pstl sites of pTYBl 
plasmid (New England Biolabs, Beverly, MA), creating a vector, 

25 pALSII. A shorter construct, pTYBT-ALSII, was obtained by the 

removal of a 3-kb non-essential sequence from pALSII by 
restriction digestion with Pmel and BstZ172 followed by self 
ligation. The herbicide resistant mutation, Alanine26 to Valine, 
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was introduced in pTYBT-ALSII by site-directed mutagenesis 
using Quickchange Site-Directed Mutagenesis kit (Stratagene, 
La Jolla, CA). The mutagenesis primers were 5'-CCGGGTGGCG 
TAATTATGCCGGTTTACG-3 ' (SEQ ID NO:5) and 5'-CGTAAACCG 
5 GCATAATTACGCCACCCGG-3' (SEQ ID NO:6). The mutated 

ALSII (ALSIIm) coding sequence generated by partial Ndel 
and Pstl digestion of pTYBT-ALSIIm was ligated with pTYBl to 
produce an ALSIIm expression vector, pALSIIm. 

10 2. Selection of Split Site 

One preferred method for identifying a suitable split site 
within any gene, is to analyze the sequence homology of a 
family of proteins and to examine its protein structure or the 

15 structure of its homoiogues (Ibdah et aL, Biochemistry, 

35:16282-16291 (1996)). Sequence alignment and structure 
comparison suggest that the ALS genes of bacteria, yeast 
and higher plants share highly conserved regions (Figure 6, 
only partial sequence alignment is shown here). Still, there 

20 are highly variable regions present in the proteins, such as 

the region around amino acid residues Q327 and C328 in the 
isoform II of the E. coli acetolactate synthase (Figure 6). £. 
coli ALSII has a 10 amino acid gap in this region compared to 
other homoiogues and the flanking sequence has less 

25 homology among ALS genes from different species (Figure 6). 

Furthermore, analysis of the crystal structure of a homologue, 
pyruvate oxidase, suggests that Q327 and C328 are likely to 
be located in a linker structure between two intra-molecular 
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domains, away from the catalytic core (Ibdah et a/., 
Biochemistry, 35:16282-16291 (1996)). We reasoned, 
therefore, that ALSII split by an intein at this region may 
retain the necessary flexibility to allow efficient protein trans- 
5 splicing. In addition, insertion of a foreign protein sequence 

into this location may have less or no effect on the structure 
of the catalytic domain of ALSII and its enzymatic activity. 
Thus amino acid residues Q327 and C328 were selected as 
one of the split sites for E. coli ALSII (indicated by an arrow, 
10 Figure 6). 



3. £. coli assay system 

The isoform II of the E. coli acetolactate synthase that 
15 possesses the mutation AIa26Val, referred to as ALSIIm, 

confers resistance to sulfonylurea herbicides (SU), such as 
sulfometuron methyl (SM), in E. coli strain ER2744. E. coli 
ER2744 strain was employed as an in vivo model system for 
assessing the activity of the herbicide resistant E. coli ALSII 
20 gene, genetically modified by a linker insertion between Q327 

and C328. E. coli ER2744 is derived from wild type E. coli K12 
that contains the active ALSI and ALSIII enzymes, but not an 
active ALSII. ALSI and ALSIII are two isoforms of ALS genes in 
E. coli , which are crucial for the synthesis of valine, isoleucine 
25 and leucine (LaRossa and Schloss, J. Biol. Chem. 259:8753- 

8757 (1984)). Their activity is sensitive to the valine feedback 
inhibition. Therefore, by saturating the growth medium with 
100 pg/ml valine (Sigma, St. Louis, MO), ALSI and III will be 
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inhibited and the cells will stop growing. By introducing a 
recombinant herbicide resistant ALSII (ALSIIm) into E. coll 
cells, their growth will be rescued since ALSII is resistant to 
valine inhibition. 

4. Generation of a modified herbicide resistance 
ALS gene 

Inteins often require certain amino acid residues 
flanking its N- and C-termini to achieve optimal splicing or 
trans-splicing activity. For example, the intein from the dnaE 
gene of Synechocystis species PCC6803 spliced efficiently 
when 5 native residues were present at both its IM- and C- 
termini, while deletion of these residues inhibited splicing 
activity to various extents (Evans eta/., J. BioL Chem. 
275:9091-9094 (2000)). Inclusion of these optimal amino acid 
residues at the splice junctions may be required for proficient 
splicing activity. The resulting product may therefore possess 
these residues at the ligation junction of two protein 
sequences. Thus, for each intein insertion site, it is necessary 
to assess if these extra amino acid residues will have an 
adverse effect on the activity of the product. 

ALSIIm- 14 was constructed by insertion of a synthetic 
DNA linker (New England Biolabs, Beverly, MA), encoding the 
following 14 amino acid residues (NH2-LEKFAEYCFNKSTG- 
COOH (SEQ ID NO:7)), into the ALSIIm coding sequence 
between Q327 and C328A. The herbicide resistance activity 
of ALSIIm-14 was examined using E. coll ER2744 host cells 
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transformed by the plasmid expressing ALSIIm-14 protein. E. 
coli ER2744 cells transformed with plasmids expressing the 
wild type ALSII and herbicide resistant ALSII (ALSIIm) were 
used as controls. 

Plate assays were conducted to examine the capability 
of ALSIIm-14 to rescue E. coli ER2744 from valine (100 ug/ml) 
or valine plus herbicide SM (50 ug/ml, Supelco Park, 
Bellefonte, PA) saturated M9 minimum medium plate 
(Sambrook era/., (1989)). The M9 medium contains 2 ug/ml 
Thiamin, 2 mM MgS0 4/ 0.1 mM CaCI 2 , 0.2% glucose, 50 ug/ml 
of kanamycin, 100 ug/ml of ampicillin and 0.3 mM IPTG. For 
the plating assay, 100 ul of 25 mg/ml Valine with or without 
50 ul of 25 ug/ml Sulfometuron methyl (SM) was spread on M9 
selection plate. To assay bacterial growth, overnight cultures 
were streaked on M9 plates with or without valine and/or SM. 
The plates were incubated at various temperatures (as 
indicated in Figure 7) for 48 to 72 hrs before the pictures 
were taken. On the plate supplemented with valine, cells 
expressing either ALSII, ALSIIm or ALSIIm-14 were able to 
grow (Figure 7-a). However, when both valine and SM was 
applied, only strains expressing herbicide-resistant ALSIIm or 
ALSIIm-14 were able to grow (Figure 7-b). These in vivo 
results demonstrated that ALSIIm with 14 amino acid 
residues inserted at the proposed split site, rescued E. coli 
ER2744 growth in the presence of valine and SM. Therefore, 
ALSIIm-14 is functionally active and the 14 amino acid 
insertion does not affect its enzymatic activity. 
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5. Construction of ALSII-Intein fusion genes 

Next , the E. coli ALSIIm gene was split and fused in- 
frame to the N- and C-terminal halves of the Ssp DnaE intein 
coding regions. The fusion genes were created using two 
compatible E. coli expression vectors, pMEBlO and pKEBl, 
which are capable of co-expressing two intein fusion genes in 
the same E. coli host cell , as previously described by Evans et 
al. (J. Biol. Chem. 275:9091-9094 (2000)). The DNA sequence 
encoding for an N-terminal fragment of 327 amino acids of the 
herbicide resistant ALSII (ALSIIM) gene was fused in frame to 
the coding region for the 7 amino acid residues flanking the N- 
terminus of the Ssp DnaE intein, followed by the intein N- 
terminal 123 amino acid residues (IN n ) (Figure 5). The DNA 
sequence encoding the C-terminal 221 amino acid residues of 
ALSIIm was fused in frame to the DNA sequence encoding the 
C-terminal 36 amino acid residues of the Ssp DnaE intein 
(INc)and the 7 amino acid residues flanking the C-terminus of 
the intein (Figure 5). ALSII N-terminal fragment was amplified 
from pALSIIm using primers 5'- GGGGGTCATGAATGGCGCACAG 
TGGG-3' (SEQ ID NO: 10) and 5'-GCGCGCTCGAGTTGATTTAACGG 
CTGCTGTAATG-3 ' (SEQ ID NO: 11). The amplified fragment was 
digested and cloned into the Ncol and Xhol sites of pMEB16, 
which contains the sequence encoding the N-terminal 123 
amino acid residues of the Ssp DnaE intein. The resulting 
vector pEA(N) expresses a fusion protein composed of the 
ALSIIm N-terminal fragment and the DnaE N-terminal 
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fragment (ALSIIm(N)- IN n ). The ALS II C-terminai fragment 
was amplified using primers 5 GCG CG ACCG GTTGTG ACTGG C A 
GCAACACTGC-3' (SEQ ID NO: 12) and 5'-GGGGGGCTGCAGTCA 
TG ATAATTTCTCCAAC- 3 ' (SEQ ID NO: 13). The fragment was 
5 digested with Agel and Pstl and then cloned into the Agel 

and Pstl sites of pMEB9, The resulting plasmid pEA(C) 
expresses a fusion protein composed of the Ssp DnaE intein 
C-terminal fragment and the ALSII C-terminal fragment 
(ALSIIm(C)-INc). A Ikb Xbal-Pstl fragment containing 
10 ALSIIm(C)-IN c fusion gene was subcloned from pEA(C) into 

the Xbal and Pstl sites of pKEBl plasmid to produce a 
kanamycin resistant expression vector pKEC3. 



When pEA(N) and pKEC3 were co-expressed in E. coli 
15 ER2744, it was predicted that trans-splicing of the two fusion 

proteins would result in ligation of the two split halves of the 
E. coli ALSIIm, with 14 amino acids present at the ligation 
junction. 



20 6. Characterization of protein trans-splicing 

activity 

To determine whether ALSII-DnaE intein fusion proteins 
are able to tra/7S-sp!ice in E. coli ceils to produce ALSIIm-14, 
25 western btots were performed using rabbit antiserum 

specifically against either the N- or C-terminal fragment of 
ALSIL 
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Two rabbit antisera were raised against peptides 
derived from the N-terminal and Oterminal regions of ALSII, 
respectively (COVANCE). These two peptides are 1) NH 2 -CAQ 
WVVHALRAQGVNTVFGYG-COOH (SEQ ID NO: 8) derived from 
the ALSII N-terminal sequence (amino acid residues Ala4 to 
Tyr23) and 2) NH 2 -CVWPLVPPGASNSEMLEKLS-COOH (SEQ ID 
NO:9) derived from the ALSII C-terminal sequence (amino acid 
residues Val 530V to Ser548). A single bacterial colony was 
inoculated in LB medium supplemented with 100 pg/ml of 
ampicillin for 4 hrs at 37°C. Then it was induced by addition of 
IPTG to 0.3 mM final concentration. Cells were further cultured 
for 2-16 hours at 15 °C. 20 pi of cell culture was removed, 
mixed with 3XSDS loading buffer (New England Biolabs, 
Beverly, MA), boiled for 5 minutes and 2 pi was loaded to 12% 
Tris-glycine gel (Novex, San Diego, CA). Subsequently 
proteins were transferred to a nitrocellulose membrane and 
blocked with 5% dry milk for one hour at room temperature 
(Sambrook, et aL, Molecular Cloning, (1989)). Immunoblotting 
was performed using antiserum (1:20000 dilution) overnight 
at 4°C in the presence of 1% dry milk. Blots were then 
washed three times for 15 minutes each and incubated with 
1:10000 diluted HRP-conjugated anti-rabbit secondary 
antibody for 1 hour at room temperature. The reactions were 
visualized with Chemiluminescent Western Detection kit (New 
England Biolabs, Beverly, MA). 

In control cells cultured at 15°C, expression of ALSII 
(Figure 8A, 8B & 8C, lane 2) was recognized specifically by 
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both antibodies. In cells bearing a single ALSII-intein fusion 
vector and another control vector to confer both ampicillin and 
kanamycin resistance only ALS(N)-IN n or ALS(C)-IN C protein 
was detected by anti-ALS(N) or anti-ALS(C) serum (Figure 8A, 
5 lane 3, Figure 8B, lane 4). When ALS(N)-IN n and ALS(C)-IN C 

were co-expressed a 60 kD band, as expected for the spliced 
product ALSIIm-14, reacted with antibodies raised against the 
N-terminus and C-terminus of ALSII (Figure 8A & 8B, lane 5). 
This band of AISIIm-14, as predicted, exhibited a slightly 
10 higher molecular weight than native ALSII. The data indicated 

that trans-splicing occurred between the two ALSII-intein 
fusion proteins. A non-specific protein reacting with anti- 
ALS(N) was observed (Figure 8A and Figure 8C, lane 1 to lane 
5). 

15 

Trans-splicing activity of the Ssp DnaE intein was 
previously shown to be temperature sensitive (Evans et al., J. 
Biol. Chem. 275:9091-9094 (2000)). The temperature 
sensitivity of trans-splicing of the ALSII-Ssp DnaE intein 

20 proteins were examined by western blot analysis using an 

antiserum against ALSII N-terminal fragment (Figure 8C). Cells 
were transformed by plasmids expressing ALSII, or both 
ALSIIm(N)-IN n and ALSIIm(C)-IN c . Expression of the ALSII 
proteins were induced at 37° C for 3 hours. Co-expression of 

25 ALSIIm(N)-IN n and ALSIIm(C)-IN c was induced at 37° C for 3 

hours, 30°C for 3 hours, 25°C for 6 hours, or 15°C for 16 
hours. Cell extracts were treated with SDS sample buffer and 
denatured at 95°C to 100°C for 5 minutes and then 
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subjected to electrophoresis on a 12% SDS-PAGE. A western 
blot was probed using an antiserum raised against the ALSII 
N-terminal fragment. Figure 8C includes the following 
samples: cells with no ALSII (lane 1, control), ALSII (lane 2), 
5 ALSIIm(N)-IN n and ALSIIm(C)-IN c (lane 3 to lane 6). The cell 

culture temperature are 37°C for lane 1 to lane 3, 30°C for 
lane 4, 25°C for lane 5, and 15°C for lane 6. 

In ceils grown at 37°C, ALSIIm-14 was not detectable 
10 (Figure 8C, lane 3). However, in cells cultured at 30°C, the 

spliced product was observed with a significant amount of N- 
terminal fusion protein accumulation (Figure 8C, lane 4). In 
cells cultured at 25°C and 15°C (Figure 8C, lane 5 and 6), 
only the spliced product was detected, indicating a complete 
15 conversion of the N-terminal fusion protein to the spliced 

product. The ALSIIm(C)-IN c protein was produced in excess 
under all the expression conditions. The data demonstrated 
that the Ssp DnaE intein was capable of mediating trans- 
splicing of the N- and C-terminal ALSIIm protein segments to 
20 form ALSIIm-14. The splicing reaction was inhibited when the 

experiment was conducted at 37°C. Splicing appeared to be 
more efficient when cells were cultured at 15°C -25°C rather 
than at 30°C. 



25 7. Herbicide resistance in ceils bearing the split 

ALS gene 

The next step was to determine whether the spliced 
product, as the result of trans-splicing of the ALSIIm 
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(ALSIIm(N)-INn and ALSIIm(C)-IN c ) fusion proteins, would 
render E. coll ER2744 resistant to valine and SM. The first 
experiment was to test the effect of co-expression of 
ALSIIm(N)-IN n and ALSIIm(C)-IIM c fusion proteins on cell 
5 growth in valine saturated M9 minimum medium. In a plating 

assay (see Section 4.), all transformed cells grew well on M9 
medium in the absence of valine (Figure 9A-a). However, only 
ALSII and its herbicide resistant mutant ALSIIm rescued the 
cells growth at both 30°C and 37°C (Figure 9A-b, 9A-c) in the 

10 presence of valine. Significantly, co-expression of ALSII(N)-IN n 

and ALSII(C)-IN C rescued cell growth at 30°C (Figure 9A-c) or 
lower temperatures (data not shown) from a valine plate. 
Furthermore, expression of ALSIIm or ALSIIm(N)-IN n and 
ALSIIm(C)-IN c rescued cells from additional herbicide inhibition 

15 (Figure 9A-d). Moreover, transformation of wild type ALSII 

could not rescue cell growth from herbicide inhibition (Figure 
9A-d). The control cells which expressed either ALSII(N)-IN n or 
ALSII(C)-IN C alone did not grow on a valine plate (Figure 9A- 
b, 9A-c); neither did the co-expression of native ALSII N- and 

20 C-terminal segments which were not fused to the intein 

(Figure 9A-b, 9A-c). The data indicates that co-expression of 
ALSIIm(N)-IN n and ALSIIm(C)-IN c fragments is required for 
tra/?s-spiicing and generating a functional ALSII, which can 
rescue ceil growth from valine and herbicide inhibition. 



A quantitative liquid culture assay was performed to 
verify the results obtained from the plating assay. The liquid 
assay was performed as follows. A single colony was used to 
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inoculate LB medium supplemented with kanamycin and 
ampicilfin at 37°C for 4 hours. Expression was induced by 0.3 
mM IPTG and cell cultures were shifted to 30°C for another 
2hrs. Then, 200 uL of equivalent OD 600 8.0 was spun down, 
washed one time with M9 medium and resuspended in 200 ul 
N!9 medium. 40 pi was aliquoted into 2ml of appropriate 
culture medium and grown for 24-72 hours before its OD 600 
was measured. The concentration for valine is 100 ug/ml and 
for SM is 50 ug/ml. At 30 °C, all transformed cells grew equally 
well in M9 minimum medium (Figure 9B). In valine saturated 
M9 medium, wild type ALS allowed cells to grow, but no 
growth was observed when SM was added. However, the 
expression of ALSIIm or co-expression of ALSIIm(N)-IN n and 
ALSIIm(C)-IN c allowed cells to grow in valine M9 medium, as 
well as medium containing SM. In control experiments, 
ALSIIm(N)-IN n or ALSIIm (C)-IN C alone or co-expression of 
ALSIIm N- and C-terminus not fused to the intein, did not 
rescue cell growth in valine containing medium. This data is in 
agreement with that from the plating assay. To further 
compare the growth kinetics for frans-splicing mediated cell 
growth to wild type ALSII mediated cell growth, a time course 
study was performed (Figure 9C). Data showed that ALSII 
expressing cells have the fastest growth rate followed by 
ALSIIm expressing cells. ALS(N)-IN n and ALS(C)-IN C 
transformed cells have slower growth rates compared to 
ALSII wild type expressing cells, but not significantly less than 
the growth rate of ALSIIm expressing cells. Cells expressing 
split ALSII, with no fusion to the intein, have very slow 
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growth. Therefore, from the plating and liquid assay we have 
demonstrated that the Ssp DnaE can mediate ALSII trans- 
splicing, which results in a functionally herbicide-resistant 
ALSIIm-14 in vivo. 

In conclusion the data indicated that the two ALS-intein 
fusion proteins, produced from two different loci, underwent 
trans-sp\ icing in a temperature-dependent manner to form a 
full length, functional ALSIIm protein. E. co/i host cells 
possessing both ALSIIm fusion gene fragments showed the 
herbicide resistance phenotype. 

EXAMPLE II 

Trans-splicing of a maize Acetolactate Synthase in E. coii 

In this Example, we demonstrate a method to produce a 
full length maize acetolactate synthase by protein trans- 
splicing in E. coii. We demonstrate how to choose a split site 
in the maize ALS gene based on the sequence homology of 
the maize ALS gene and its E. coii counterpart, the ALSII 
gene. We show that, when the split maize ALS-intein fusion 
genes were co-expressed, the two fusion proteins underwent 
trans-sp\ icing to produce a protein product of expected size 
for the mature maize ALS protein. 
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1. Selection of split site. 

It is important to demonstrate the trans-splicing of other 
herbicide resistant genes, such as the maize acetotactate 
synthase (cALS) gene, of which the herbicide resistant mutant 
form has been utilized to genetically modify plants 
(Bernasconi et aL, J. Biol. Chem. 270:17381-17385 (1995)). 
One preferred method for the identification of a suitable split 
site within any gene is to analyze the tra/7s»splicing activity of 
a homologous gene from a different organism. We have 
described in Example I, that E. coli ALSII gene, after being 
split between Q327 and C328, can be reconstituted by the 
trans-splicing activity of the Ssp DnaE intein in vivo. Sequence 
alignment between E. coli ALSII and maize ALS was conducted 
to search for the region in the maize ALS gene corresponding 
to the split site of the E. coli ALSII gene. The result suggests 
that Serine397 and Threonine398 align with the split site 
(Giutamine327 and Cysteine328) of E. coli ALSII. Splitting the 
maize ALS between Serine397 and Threonine398, as 
indicated by a star (Figure 6), may yield two maize ALS-intein 
fusion proteins which would be capable of proficient splicing. 

2. Cloning of the maize ALS gene 

Reverse transcriptase polymerase chain reaction (RT- 
PCR) was carried out to clone the maize ALS cDNA. 

Total RNA was isolated from corn leaves using the 
RNAqueous kit (Ambion, Inc., Texas), The RNA was then used 
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for first strand cDNA synthesis using the reverse primer 3-3 
(5'-AT CAGTACACAGTCCTGCCATC-3' (SEQ ID NO: 14)) and 
Superscript Reverse Transcriptase (LTI-GIBCOBRL, Rockville, 
MD). The first strand cDNAs were then treated with RNaseH 
5 (LTI-GIBCO BRL, Rockville, MD) before being used as a 

template in a PCR reaction. The PCR reaction was carried out 
using Expand Long Template PCR system (Boehringer 
Mannheim, Germany). The primers used in this reaction were 
Reverse Primer 3-3 and cALS 5-4 primer (5' GAGACAGCCGCCG 
10 CAACCAT-3' (SEQ ID NO: 15)). 



An aliquot of the PCR product was electrophoresed on 
an agarose gel and a band of approximately 2 kb was 
observed. This fragment was cloned into the TOPO 2.1 vector 
15 (Invitrogen, San Diego, CA, manufacturer's protocol) to make 

pCALSl. The sequence of pCALSl was confirmed using M13 
forward and reverse primers. 



3. Construction of the maize ALS-intein fusions 

20 

The DNA encoding for the N-terminal 397 amino acid 
residues of the maize ALS gene was amplified by PCR using 
forward primer 5'-GGGCCCATATGGCCACCGCCGCCGCCGCG-3' 
(SEQ ID NO: 16), reverse primer 5'-GGGCCCTCGAGGCTTCCTTC 
25 A AGAAG AGC- 3 ' (SEQ ID NO: 17), and the template pCALSl 

(Sambrook etaL, Molecular Cloning, (1989)). A 1.2 kb PCR 
product was cloned into TOPO-blunt vector (Invitrogen, San 
Diego, CA manufacturer's protocol), resulting in TOPO-cALS(N). 
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Then TOPO-cALS(N) was digested with Ndel and Xhol. A 1.2 
kb digested DNA fragment was recovered from low melting 
agarose gel and fused in-frame to the DNA sequence 
encoding the N-terminal 123 amino acids of the Ssp DnaE 
intein, resulting in a vector (MEBlO-cALS(N) which expresses 
the N-terminal cALS-intein fusion protein, cALS(N)-IN-n. A DNA 
fragment encoding for the C-terminal 241 amino acid residues 
of the maize ALS gene was PCR amplified using forward 
primer 5'-GGGCCACCGGTACATCAAAGAAGAGCTTG-3' (SEQ ID 
NO:18), reverse primer 5 '- G G GGCTGCATTCAGTACACAGTCCTGC 
CATC-3' (SEQ ID NO: 19), and the template pCALS4. A 0.8 kb 
PCR product was cloned into a TOPO-blunt vector (see 
protocol above), TOPO-cALS(N). TOPO-cALS(N) was then 
digested with Agel and Pstl. A 700 bp DNA fragment was 
recovered from low melting agrose gel and was fused In 
frame to the DNA encoding the C-terminal 36 amino acids of 
the Ssp DnaE intein, resulting in a vector MEB9-cALS(C). 
MEB9-cALS(C) was further cut by Xbal and Pstl and released a 
1 kb fragment. This 1 kb fragment was cloned into pKEBl 
vector to create a kanamycin resistant expression vector for 
the cALS-intein C-terminal fusion protein, cALS(C)-IN c . The 
same extra 7 amino acids, NH 2 -LEKFAEY-COOH (SEQ ID 
NO:20) and NH 2 -CFNKSTG-COOH (SEQ ID NO:21) were also 
present at the junctions of the N- and C-terminal cALS-intein 
fusion proteins, respectively. 
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4. Trans-splicing of the maize ALS-intein fusion 
proteins 

Both ALS-intein fusion fragments, cALS(N)-IN n and 
cALS(C)-IN C/ described in Section 3, were co-expressed in £. 
coli ER2744 under the same conditions as described in 
Example I, Section 6. A western blot was performed to detect 
the trans-splicing product (Method, see Example I, Section 6). 
On the blot, a fragment of 69 kD, which corresponds to the 
size of the wild type cALS (Figure 10A, and Figure 10B, lane 
2), was detected in both fusion proteins expressed ceils and 
was recognized by rabbit antisera specifically raised against 
two peptides derived from N- and C-terminal sequence of 
maize ALS (Figure 10A and Figure 10B, lane 5). A non-specific 
protein reacting with antiserum against N-terminal of maize 
ALS was observed (Figure 10A, lane 1 to lane 5). The 
peptides used to raise antibodies are 1) ALS-N peptide 
corresponding to the sequence from Lys66 to Ala85, NH 2 - 
CKGADILVESLERCGVRDVFA-COOH (SEQ ID NO:22), and 2) 
ALS-C peptide corresponding to the sequence from Ile619 to 
Tyr638, NH 2 -CI PSGGAFKDMILDGDGRTVY-COOH (SEQ ID 
NO:23). The full length cALS species was not detected in cells 
expressing either N- or C-terminal fusion protein (Figure 10A 
and Figure 10B, lane 3 and lane 4). This demonstrated that 
split maize ALS, like E. coli ALSII, when fused with the Ssp 
DnaE intein, was also able to perform tra^s-splicing to 
produce the full length ALS. 
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In conclusion, the maize ALS gene was split by the Ssp 
DnaE intein and cloned into two separate plasmid vectors. 
When both the fusion gene vectors were introduced into the 
same host ceil and co-expressed, the two fusion proteins 
5 underwent trans-splicing to produce a full length cALS. 

Although a functional assay is needed to determine the 
activity of the spliced maize ALS protein in plants, it does raise 
the possibility of successfully splitting a plant herbicide 
resistant or disease resistant genes into two inactive gene 

10 segments. These two gene fragments can be confined into 

two separate cellular compartments, such as the chloroplast 
and nucleus, or two separate loci on the chromosomes, or 
two separate DNA vector. This novel mode of gene expression 
may greatly lessen the chance of spreading an intact active 

15 transgene into other species. 

EXAMPLE III 

The present Example details the feasibility of splitting 
20 the aroA gene and regenerating the desired protein activity 

using an intein. The experiment consisted of dividing the 
gene encoding the mutant aroA gene at various positions and 
fusing the gene encoding the N-terminal splicing domain of 
the Ssp DnaE intein (IN n ) to the gene encoding the N-terminal 
25 fragment of the EPSPS protein. Concurrently, the gene 

encoding the C-terminal splicing domain of the Ssp DnaE 
intein (IN c )was fused to the gene encoding the C-terminal 
fragment of the EPSPS protein. When the fusion genes were 
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placed on to two separate plasmids and co-transformed and 
co-expressed in the same bacterial cell it was demonstrated 
that those bacterial cells were resistant to the herbicide 
glyphosate. 

5 

The cloning of Salmonella typhimurium aroA gene that 
confers resistance to glyphosate 

1. Creation of plasmid pEPS#l 

10 

The Salmonella typhimurium aroA gene with the C301 to 
T mutation was acquired from the American Type Culture 
Center in the form of a cosmid in the bacteria Salmonella 
choleraesuis subsp choleraesuis (ATCC No, 39256). The 

15 modified aroA gene was amplified from the cosmid by the 

polymerase chain reaction using primers EPSP#1 (S'-GGATC 
CTAAG AAGG AG ATATACCCATGG AATCCCTG ACGTTACA- 3 1 (SEQ ID 
NO: 24)) and EPSP#2 (5 T -GTCGACGCTCTCCTGCAGTTAGGCAGGC 
GTACTC ATTC- 3 1 (SEQ ID NO: 25). The PCR product was 

20 inserted into the Stul site of the plasmid LITMUS 28 (New 

England Biolabs, Inc., Beverly, MA). Following transformation 
and plasmid preparation, sequencing revealed an unexpected 
mutation (C103 to G) which was reverted using Stratagene's 
(La Jolla, CA) Quick Change Site Directed Mutagenesis Kit and 

25 primers EPSP#10 (S'-GCTTTGCTCCTGGCGGCTTTACCTTGTGGT 

AAAACCGC-3' (SEQ ID NO: 26) ) and EPSP#11 (5'-GCGGTTTTAC 
CACAAGGTAAAGCCGCCAGGAGCAAAGC-3' (SEQ ID NO:27)). 
Sequencing of DNA from the resulting colonies revealed that 
the unexpected mutation had been reverted to the expected 
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C. This plasmid was termed pEPS#8 and used as the 
acceptor plasmid in the subsequent transposition linker 
scanning reactions. 

5 2. Description of ER2799, an E. coli strain used to 

test the aroA gene constructs 

An E. coli strain that has the aroA gene deleted from its 
chromosome was acquired from the Yale E. coli stock center 

10 (£. coli strain AB2829, CGSC#2829, ID#8215). This strain was 

made hsdR- and named ER2799. Because ER2799 lacks the 
aroA gene, which is necessary for aromatic amino acid 
synthesis, it does not grow on M9 minimal media. This strain 
is used to test the various aroA gene constructs to see if the 

15 new aroA gene can rescue the bacteria and allow growth on 

minimal media either in the presence or absence of 
glyphosate. 

3. Finding a site to split the aroA target gene by 
20 transposon based linker scanning 

The first step in performing this experiment was to 
determine the sites in the 5-enolpyruvyl-3-phosphoshikimate 
synthetase (EPSPS) protein which could allow insertion of an 

25 intein in cis. In cis refers to the fact that the complete intein 

is inserted into the complete EPSPS protein. However, it was 
not known which portions of the EPSPS protein itself would be 
tolerant to extra amino acid residues. So to determine where 
the EPSPS protein could tolerate amino acid insertions a new 

30 technology, the GPS®-LS kit (available from New England 
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Biolabs, Inc., Beverly, MA), was used to randomly insert 5 
amino acid residues throughout the EPSPS protein sequence. 
An expression plasmid library was constructed with the EPSPS 
gene with the randomly inserted 5 amino acids. This library 
5 was transformed into E. coli strain ER2799 and applied to 

plates containing M9 minimal media. ER2799 lacks the aroA 
gene and will not grow on M9 minimal plates unless an active 
EPSPS gene is supplied by plasmid transformation. The 
ER2799 E. coli that grew following transformation with the 

10 library should contain an EPSPS protein that is active with the 

5 amino acid insertion. These were sequenced to determine 
the position of the 5 amino acid insertion and 42 unique sites 
were discovered in the EPSPS protein that allowed growth of 
ER2799 on M9 minimal plates (Figure 15). Furthermore, 

15 another 19 unique sites were found that did not tolerate a 5 

amino acid insertion (Figure 16). 

4. Transposition Reaction 

20 The reaction was performed by adding 6 pi of 20 ng/pl 

pEPS#8 (target DNA), 1.5 pi of 20 ng/pl Pmel donor DNA, 3 pi 
of distilled water, 3 pi of 10X GPS®-LS buffer and 1.5 pi of 
Tn*ABC and mixing for 15 min at 37°C. 1 jai of Start Solution 
was added and the reaction incubated at 37° C for 1 hour and 

25 20 min. The reaction was stopped by heat inactivation for 15 

min at 75 °C. Following cooling the reaction mixture to room 
temperature and dialysis against water for 2 hours the 
reaction mixture was transformed into freshly-made ER2685 
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(fhuA2 glnV44 e!4- rfbDl? relAl? endAl spoTl? thi-1 A(mcrC- 
mrr)114::IS10 A (lad-lacA)200 F'proA+B+lacIq Dl (lacZ)M15 
zzf:TnlO (TetR)) cells by electroporation. The cells were 
incubated for 1 hour at 37°C and then plated onto LB plates 
5 containing ampicillin and kanamycin. Cell growth was allowed 

to proceed at 37 °C overnight. It was discovered that 10 ul of 
reaction mixture gave over 10,000 colonies (enough to cover 
all possible transposon insertion sites, 2840 sites in pEPS#8, 
3.3 times) following transformation. 

10 

5. Isolating the DNA fragment (3.0kb) containing 
the EPSPS gene plus transposon 

All the transformants from the transposition reaction 
15 were recovered using LB medium and 66% of the cells were 

saved at -70°C by adding 20% glycerol. The rest were grown 
in 500 ml of LB liquid medium containing 100 ug/ml ampicillin, 
and 50 ug/ml kanamycin at 37°C overnight. The cells were 
harvested by centrifugation and the piasmid DNA was purified 
20 (508 ug total) using a Qiagen Midi kit (Qiagen, Studio City, 

CA). The 3.0kb aroA gene-Transposon DNA fragment was 
released by digesting the DNA(58 ug) with Pstl, Ncol and Ahdl 
and isolated by gel-purification using agarase following 
ethanol precipitation (4 ug DNA was recovered). 

25 

6. Cloning the aroA gene-Transposon 3.0 kb 
fragment into the pCYB3 vector 

The gel-purified 3.0 kb aroA gene-Transposon DNA 
30 fragment was ligated into the Ncol to Pstl sites of pCYB3 
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(5.2kb), and transformed into ER2685 by electroporation after 
drop dialysis for 2 hours. The eiectroporated cells were 
incubated for 1 hour in LB medium. 250 ul of this cell 
suspension was plated onto LB plates containing 100 ug/mL 
ampicillin and 50 ug/mL kanamycin while another 5.5 ml was 
inoculated into 1 liter of LB liquid medium with 100 ug/mL 
ampicillin and 50 ug/mL kanamycin and grown at 37°C 
overnight. The plasmid DNA library containing the transposon 
within the aroA gene was isolated by Qiagen (Studio City, CA) 
Midi kit (750 ug). 

7. Screening the library EPSPS protein that is 
active with the 5 amino acid linker 

105 ug of the library DNA was digested with Pmel to 
remove the transposon from the aroA gene. This leaves 15 
bases (or 5 amino acid residues) at the transposon insertion 
site. A 7 kb fragment was recovered (in a final volume of 400 
Ml EB), self-ligated (86 ul out of 400 ul 7 kb fragment in a 100 
Ml rxn), transformed (30 \i\ of the 100 rxn) into E. coli strain 
ER2799 and plated onto both LB and M9 minimal plates, each 
containing 100 M9/mL ampicillin in the presence of 0.3 mM 
IPTG. Following incubation at 37°C overnight ca. 20% of the 
original cells survived on M9 minimal plates as compared to 
the LB plates. Individual colonies that grew on M9 minimal 
media plates were analyzed by Dral digestion and DNA 
sequencing to confirm the site of linker insertion site into the 
aroA gene. 
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42 different insertion sites were identified among 72 
active individual clones that can tolerate 5 amino acid 
residues inserted into the aroA gene and 19 different 
insertion sites were identified among 39 inactive clones that 
can not grow on M9 minimal media selection plates (see 
Figure 15 and Figure 16). Piasmids pCE-5-22, pCE-5-21, pCE- 
5-35 and pCE-5-23 were the active clones that have 5 amino 
acid residues incorporated into the EPSPS protein (aroA gene 
product) at positions 182, 215, 235 and 267, respectively. 
These four sites were chosen for further studies. 

Construction of Ssp DnaE Cis- and Trans-splicing vectors 

1. Creation of vectors pCE182DnaE, pCE215DnaE, 
pCE235DnaE, and pCE267DnaE for Cis-Splicing 

This involved inserting an intein into the sites in the 
target protein that were discovered to tolerate 5 amino acid 
insertions. 

Four sites were chosen for further study (positions 182, 
215, 235, and 267). The full length Ssp DnaE intein was 
inserted into these sites and the EPSPS-intein fusion was 
tested for its ability to permit ER2799 cells to grow on M9 
minimal plates. All four sites were found to grow on M9 
plates, indicating that the EPSPS protein couid tolerate the 
intein inserted at these positions (see Figure 11 and Figure 
14). 
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CE182 or CE215, which was the linear DNA of pCE-5-22 
or pCE-5-21 with the exception that the five amino acid linker 
at 182 or 215 has been removed, was generated by 
polymerase chain reaction (PCR) from templates pCE-5-22 or 

5 pCE-5-21 using primers 5'-GCCCCTAAAGACACAATTATTCGCG-3' 

(SEQ ID NO: 28) and 5'-CAGCGGCGCCGTCATCAGCAGAGCG-3' 
(SEQ ID NO: 29) for CE182 or 5'-GCGAACCACCACTACCAACAATT 
TG-3' (SEQ ID NO: 30) and 5 '-TATCTCCACGCCAAAGGTTTTCATT- 
3' (SEQ ID NO: 31) for CE215. The Ssp DnaE intein gene 

10 containing two native N-extein residues and three native C- 

extein residues was amplified by PCR from pMEB8 (Evans, et 
a!., J. Biol. Chem., 275:9091 (2000)) using primers 5'-GAATAT 
TGCCTGTCTTTTGGT-3' (SEQ ID NO: 32) and 5'-GTTAAAGCAGTT 
AGCAGCGAT-3' (SEQ ID NO: 33). The resultant PCR fragment 

15 was phosphorylated with T4 polynucleotide kinase, purified by 

QIAquick column (Qiagen, Inc., Studio City, CA) and ligated 
into CE182 or CE215 to generate pCE182DnaE or 
pCE215DnaE, respectively. 

20 The Ssp DnaE intein gene containing four native N- 

extein residues and three native C-extein residues was 
amplified by PCR from pMEB8 using primers 5 '-TGCTG AATATTG 
CCTGTCTTTTGG-3' (SEQ ID NO: 34) and 5'-CCGTTAAAGCAGTTAG 
CAGCGATAGC-3' (SEQ ID NO: 35). The resultant PCR fragment 

25 was purified by QIAquick column (Qiagen Inc., Studio City, CA) 

and ligated into the gel-purified, Pmel cut pCE-5-35 or pCE-5- 
23 vector DNA to generate pCE235DnaE or pCE267DnaE, 
respectively. 
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2. Creation of vectors p215EN2/pEPS#28 and 
p235EN2/pEPS#29 for Trans-Splicing: 

5 Two plasmids were constructed with compatible origins 

of replication. The N-terminus of the appropriate EPSPS 
protein was fused to the N-terminus of the N-terminal Ssp 
DnaE splicing domain (IN n )and inserted into one ptasmid. The 
remaining C-terminal portion of the appropriate EPSPS protein 

10 was fused to the C-terminus of the C-terminal splicing domain 

of the Ssp DnaE intein (IN C ). This fusion was inserted into the 
second plasmid. The plasmids were co-transfected into 
ER2799 by electroporation. Expression of the fusion protein 
was under the control of an IPTG inducible pTac promoter. 

15 The transformed cells grew on M9 minimal plates, liquid M9 

minimal media, or liquid M9 minimal media supplemented with 
glyphosate (Figures 11, 12, 13 and 14). This indicated that 
the protein halves could generate an active EPSPS protein 
when co-expressed in the same cell. 

20 

The 0.6 kilobase Xhol to Pstl fragment of pMEB4 was 
gel-purified using the QIAquick extraction kit and ligated into 
the Xhol to Pstl sites in the pCYB3 (New England Biolabs, Inc., 
Beverly, MA) vector to generate pCENl. The Ncol site 
25 between the Ssp DnaE intein and the chitin-binding domain 

(CBD) was removed by Pad and Sapl digestion of pCEN2 
followed by T4 DNA polymerase treatment and self-iigation to 
generate plasmid pCEN2. This vector contains the N-terminal 
123 amino acid residues of the Ssp DnaE intein (IN n ) under 
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the control of pTac promoter and confers resistance to 
ampicillin. 

P215EN2 or p235EN2 were constructed by ligating the 
Ncol to Kpnl fragment of pCE215DnaE or pCE235DnaE into 
the same sites of pCEN2. p215EN2 or p235EIM2 has the N- 
terminus of EPSPS (residues 1-215 for p215EN2, 1-235 for 
p235) fused to the Ii\J n . 

The Ncol to Fspl fragment of pCYB3 was ligated into the ^ _ 
Ncol to Oral sites of pKEBl to generate pKEB12 (NEB#1282), 
A sample of pKEB12 plasmid transformed in E. coli strain 
ER2566 has been deposited under the terms and conditions 
of the Budapest Treaty with the American Type Culture 
Collection on May 23, 2000 and received ATCC Patent Deposit 
Designation No. PTA-1898. This vector has the C-terminal 36 
amino acid residues of the Ssp DnaE intein (IN n ) fused to CBD 
and confers resistance to kanamycin. 

pEPS#28 and pEPS#29 were constructed by ligating the 
Bglll to Pstl fragment of pCE215DnaE and pCE235DnaE into 
the same sites of pKEB12. pEPS#28 or pEPS#29 has the C- 
terminus of EPSPS (residues 216-427 for pEPS#28, 236-427 
for pEPS#29) replacing the CBD in pKEB12 and attached to 
the C-terminus of IN C . 
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3. Creation of the EPSPS complementary construct 
pEPS#34 and pEPS#36. 

When the EPSPS protein fragments, lacking the intein 
domains, were co-expressed in ER2799 cells, the cells failed 
to grow on M9 minimal plates, liquid M9 minimal media, or 
liquid M9 minimal media supplemented with glyphosate 
(Figure 12 and Figure 13). This indicated that EPSPS activity 
was absolutely dependent on the presence of both intein 
halves. 

DNA encoding the IN-terminus of the EPSPS protein, 
residues 1-235, (EPS235N) was amplified by PCR from 
pCE235DnaE using primers 5 '- GGATCCTAAG AAG G AGATATACCC 
ATGGAATCCCTGACGTTACA-3 ' (SEQ ID NO:36) and 5'-GATATC 
CTGCAGTTAACCTGGAGAGTGATACTGTTGACC-3 ' (SEQ ID NO:37). 
The resultant PCR product was purified using a QIAquick PCR 
kit, digested with Ncol and Pstl, purified from an agarose gel 
using the QIAquick extraction kit and ligated into the Ncol to 
Pstl sites of plasmid pCYB3 to generate pEPS#34. 

Plasmid pEPS#36 was created by amplifying DNA 
encoding the C-terminus of EPSPS, residues 236-427, 
(EPS235C) by PCR from pC+E2 using primers 5 '- G ATATCCCATG 
GGACGCTATCTGGTCGAGGGCGATG-3' (SEQ ID NO: 38) and 5'-GT 
CGACGCTCTCCTGCAGTTAGGCAGGCGTACTCATTC-3' (SEQ ID 
NO: 39). The resultant PCR product was purified using the 
QIAquick PCR kit, digested with Ncol and Pstl, purified from 
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agarose gel and iigated into the Ncol to Pstl sites of plasmid 
pKEB12. Two extra residues Met-Gly were also incorporated 
at the N-terminus of EPS235C due to the Ncol site for cloning. 

5 4. Creation of Vectors Containing the Cis or Trans 

"dead" Ssp DnaE intein at position 235 
(pEPS#31, pEPS#33, pEPS#37 ). 

Interestingly, trans-splicing was not required for activity, 
10 because if three of the most highly conserved catalytic 

residues of the Ssp DnaE intein were changed to alanine the 
co-transformed ER2799 cells still grew. This event 
demonstrates that the intein can act as an affinity domain to 
bring the two EPSPS intein fragments together (Figure 12 and 
15 Figure 13). 

The Ssp DnaE intein gene containing four native N- 
extein residues and three native C-extein residues was 
amplified by PCR from pMEB8 using primers 5'-TGCTGAATATGC 

20 GCTGT C I I I I GGTACCGAA-3' (SEQ ID NO:40) and 5'-CCGTTAAA 

CGCCGCAGCAGCGATAGCGCC-3' (SEQ ID NO:41). The resultant 
PCR fragment was purified by QIAquick column (Qiagen Inc., 
Studio City, CA) and Iigated into the Pmel site of plasmid pCE- 
5-35 to generate pEPS#31. This Ssp DnaE intein contains 

25 three mutations, Cysl -» Ala/Cys+1 -> Ala/Asnl59 -» Ala, in 

the catalytic residues that eliminates its spicing activity. 
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5. Methods of Assaying EPSPS Activity 

Plating assay for EPSPS activity. The presence of a 
functional EPSPS protein could be determined in vivo using E. 
5 coii strain ER2799, which lacks an endogenously active EPSPS 

(see above). ER2799 cells alone fail to grow on M9 minimal 
plates (supplemented with 0.3 mM IPTG). In the following 
description when M9 minimal plates are mentioned they also 
contain 0.3 mM IPTG. Plasmid pC+E2, which contains the full 
10 length wild type EPSPS gene with a C301 to T mutation, is 

able to rescue growth of ER2799 on the M9 minimal plates 
when introduced by transformation. 

Assaying the Ssp DnaE cis-splicing constructs. Plasmids 
15 pCE182DnaE / pCE215DnaE, pCE235DnaE, pCE267DnaE (0.05 

ug of each) were transformed into E. coii ER2799 cells by 
electroporation (Sambrook, et al., Molecular Cloning: A 
Laboratory Manual, 2 nd Edition, Cold Spring Harbor Laboratory, 
NY: Cold Spring Harbor Laboratory Press (1989)), see Fig. 11. 
20 0.8 mL of LB media was added to the transformed ceils and 

these were incubated at 37° C for 1 hour with shaking. 200 
uL of this solution was plated onto either LB or M9 minimal 
plates supplemented with 0.1 mg/mL ampicillin (Sambrook, et 
al., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold 
25 Spring Harbor Laboratory, NY: Cold Spring Harbor Laboratory 

Press (1989)). The plates were incubated for varying length 
of time and at various temperatures. The most commonly 
used being overnight at 37°C. 



WO 00/71701 



PCT/US00/14122 



-77- 



Assaying the Ssp DnaE trans-splicing constructs. The 
activity of each EPSPS trans construct was assayed by co- 
transforming the constructs to be tested into ER2799 and 

5 plating on either an M9 minimal plate, containing 0.3 mM IPTG, 

or an LB plate in which both were supplemented with 0.1 
mg/mL ampicillin and 0.05 mg/mL kanamycin. In cases where 
only one plasmid contained the EPSPS gene or a portion of 
the EPSPS gene the complementary antibiotic resistance was 

10 supplied by co-transforming the E. coli with either pCYB3 or 

pKYBl (New England Biolabs, Beverly, MA), which has no 
EPSPS gene present. 

The plasmids used were: pC+E2, p215EN2, p235EN2, 
15 pEPS#28, pEPS#29, pEPS#33, pEPS#37, pEPS#34, and 

pEPS#36. These plasmids were co-transformed (Sambrook, 
et al., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold 
Spring Harbor Laboratory, NY: Cold Spring Harbor Laboratory 
Press (1989)) using 0.1 ug of the appropriate plasmids, in 
20 various combinations, into ER2799 E. coli cells, and plated on 

both LB plates and M9 minimal media plates, each containing 
100 ug/mL ampicillin and 50 ug/mL kanamycin. The M9 
minimal plate also contained 0.3 mM IPTG. Individual clones 
were picked from each LB plate and stripped on one M9 
25 minimal media selection plate following incubation at 37°C 

overnight or RT for 2-3 days. The combinations used were: 
WT, pC+E2 and pKYBl (New England Biolabs, Beverly, MA); 
215NC, p215EN2 and pEPS#28; 215C, pEPS#28 and pCYB3; 
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235NC-Dead, pEPS#33 and pEPS#37; 235NC, p235EN2 and 
pEPS#29; 235N, p235EN2 and pKYBl; 235C, pEPS#29 and 
pCYB3; 235N-215C, p235EN2 and pEPS#28; and 235 
complement, pEPS#34 and pEPS#36 (see Figure 12). 

5 

Determination of ER2799 growth in liquid culture in the 
presence or absence of glyphosate. The testing of 
glyphosate resistance for the 235 trans constructs was made 
using plasmid combinations as follows; WT, pC+E2 and 

10 pKYBl; 235NC-Dead / pEPS#33 and pEPS#37; 235NC, 

p235EN2 and pEPS#29; 235N, p235EN2 and pKYBl; 235C, 
pEPS#29 and pCYB3; and 235 complement, pEPS#34 and 
pEPS#36. These plasmids were co-transformed into ER2799 
E. coli cells as described above and plated onto LB plates 

15 containing 100 ug/mL ampicillin and 50 ug/mL kanamycin. As 

a control, pCYB3/pKYB were co-transformed into E. coli strain 
ER2744, and plated on an LB plate containing 100 ug/mL 
ampicillin and 50 ug/mL kanamycin. A preculture was 
prepared for each transformation by inoculating the fresh 

20 colony into LB medium supplemented with 100 ug/mL 

ampicillin and 50 ug/mL kanamycin at 30°C for overnight. 
Equal amounts of pre-culture (10-lluL depending on the cell 
density) was inoculated into freshly-made M9 minimal medium 
containing 100 ng/ml of ampicillin, 50 ug/ml of kanamycin and 

25 0.3 mM IPTG in the absence or presence of different amounts 

of glyphosate. The growth of each construct was measured 
by OD at 600 nm, see Figure 13. 
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Growth of the cis 235 construct in M9 liquid minimal 
media. Two plasmid vectors one with a splicing competent 
Ssp DnaE intein (235 cis) and another with a splicing 
incompetent intein (235 dead), pCE235DnaE and pEPS#31 / 
respectively, were transformed into separate ER2799 E. coli 
cells and plated on LB plates supplemented with 100 ug/mL 
ampicillin and 50 ug/mL kanamycin. A preculture was 
prepared for each transformation by inoculating the fresh 
colony into LB medium supplemented with 100 ug/mL 
ampicillin and 50 ug/mL kanamycin at 30°C for overnight. 
Equal amounts of pre-culture (10-lluL depending on the cell 
density) was inoculated into freshly-made M9 minimal medium 
containing 100 ug/ml of ampicillin, 50 ug/ml of kanamycin and 
0.3 mM IPTG. The cell density was determined at various 
times using the OD at 600 nm (see Figurel4). 

The Ncol to Kpnl fragment of pEPS#31 was ligated into 
the same sites in plasmid pCEN2 to generate pEPS#33. 
Plasmid pEPS#37 was created by cloning the Bglll to Pstl 
fragment of pEPS#31 into the same sites in plasmid pKEB12. 

EXAMPLE IV 

Trans-splicing of two unrelated gene products 
aminoglycoside-3-acetyltransferase (aadA) and soluble 
modified green fluorescent protein (smGFP), to give rise to 
a functional hybrid protein in E.coli. 

Aminoglycoside-3-acetyltransferase gene was fused to 
Ssp DnaE intein N-fragment (IN n ). The C fragment of the Ssp 
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DnaE intein (IN C ) was fused to the smGFP gene. The fusion 
proteins couid be translated as individual polypeptides from 
the respective constructs. These fusion protein coding DNA 
sequences were cloned into either pIH976 (Figure 17) or 

5 pAGR3 (Figure 18) plasmids. Both the plasmids (pIHaadE-N 

(pIH976 containing aadA and IN n terminal) and pAGRE- 
CsmGFP (pAGR3 containing IN C and smGFP)) were co- 
transformed in to E. coli (Figure 19A). The transformed E coli 
were resistant to spectinomycin/streptomycin sulfate (Figure 

10 19B). The cell extracts were made after 16 hrs of growth. The 

proteins in the extract was separated on SDS tris glycine gel 
and blotted on to a PVDF membrane. This membrane was 
probed with anti GFP monoclonal antibodies. Trans-splicing 
was observed in E.cofi extracts, where both the plasmids 

15 were introduced. As a result of trans-splicing the fusion 

product had a molecular mass identical with the calculated 
cumulative mass of both the proteins (Figure 19C). 

The following protocol describes the production of 
20 cassettes, pIHaadE-N {Aminogiycoside-3-acetyltransferase 

gene fused to DNA encoding IN n ), pAGRE-CsmGFP (DNA 
encoding IN C was fused to smGFP gene), Western blotting 
and detection. 

25 Polymerase chain reaction (PCR) was used for cloning of 

the open reading frames (ORFs) in to the desired plasmids. 
The reaction contains Vent® DNA polymerase buffer 
supplemented with 2 mM magnesium sulfate, 200 pM dNTPs, 
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1 pM of each primer and 100 ng plasmid DNA in a total volume 
of 50|ji with 2 units of Vent® DNA polymerase. Between 10 to 
20 rounds of amplification were carried out using a Perkin- 
Elmer gene amp PCR 2400 system (Emeryville, CA). The 
5 following primers used for amplification of the aadA gene {aadA 

forward primer: GCCTTAATTAACCATGAGGGAAGCGGTGATCGC 
CG (SEQ ID NO:47), aadA reverse primer: TGCGGTCGACTTTGC 
CGACTACCTTGGTGATCTC (SEQ ID NO:48). PCR products were 
purified using a PCR purification kit (QIAquick PCR purification) 

10 from Qiagen (Valencia, CA). Purified PCR products were 

digested by Pac I and Sal I restriction enzymes and cloned in 
to pNEB193 (New England Biolabs, Inc., Beverly, MA) plasmid. 
The clone containing the aadA gene was named pNEBaad3. 
Similar protocol was used for amplification and cloning of the 

15 smGFP gene using specific primers (smGFP forward primer: 

CCCAAGCTTGGCGCCATGAGTAAAGGAGAAGAAL I I I ICAC (SEQ ID 
NO:49) and smGFP reverse primer: GCGACCGGTTTATTTGTATAG 
TTCATCCATGCCATG (SEQ ID NO: 50) into pLITMUS 28 (New 
England Biolabs, Inc., Beverly, MA). The clone containing the 

20 smGFP gene was named psmGFP7. Sequences for both aadA 

and smGFP genes were verified by DNA sequencing. 

The Intein from the dnaE gene of Synechocystis species 
PCC6803 was PCR amplified. The amino terminal part of the 
25 intein (amino acids 1-123) is referred to as IN n and the 

carboxy terminal as IN C (amino acids 124-159). Both IN n and 
IN C fragments were cloned into pLITMUS 28 and pNEB193 
respectively. The primer pairs for amplification of IN n and IN C 
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are listed (IN n forward primer: AGGGAATTCGTCGACAAATTTG 
CTGA ATATTGCCTGTCT (SEQ ID NO: 51), IN n reverse primer: 
GGCCTCGAGTTATTTAATTGTCCCAGCGTCAAGTAATG (SEQ ID 
NO: 52), IN C forward primer: AGCTTTGTTTAAACCATGGTTAAAG 
TTATCGGTCGTAGATC (SEQ ID NO: 53), IN C reverse primer: 
CAGCGTCGACGGCGCCGTGGGATTTGTTAAAGCAGTTAGCAGC 
(SEQ ID NO:54)). The plasmids containing the IN n and IN C 
fragments were pLitDnaE-Nl and pNEBDnaE-C2 respectively. 



10 Fusion constructs of intein fragments and either aadA or 

smGFP gene products were made in the following way: 
BamHl and Sail fragment (800bp) from pNEBaad3 was ligated 
into BamHl- Sail digested pLitDnaE-Nl to give rise to pAENl. 
In a similar way, the 150 bp insert (pNEBIN-c digested with 

15 PsrI and Kasl) was ligated into PsrI and Kasl digested pLit 

SmGFP5 to give rise to pGFPEC. Plasmid pAEN contains aadA 
gene in frame with IN n and pGFPEC contain smGFP gene in 
frame with IN C . 

20 The fused genes were PCR amplified and cloned into 

E.coli expression vectors. The inserts of pAEN and pGFPEC 
were cloned into pIH976 (Ncol and Sad site) and pAGR3 
(EcoRl and SacII sites) vectors. The primers are listed (aadA- 
IN n forward primers: CATGCCATGGGGGAAGCGGTGATCGC 

25 CGAAG (SEQ ID NO: 55), aadA-IN n reverse primers: ACGCG 

AGCTCTTATTTAATTGTCCCAGCGTCAAGTAATG (SEQ ID NO: 56), 
INc-smGFP forward primer: CGAATTCTATGGTTAAAGTTATCGG 
TCGTAGATC (SEQ ID NO:57), IN c -smGFP reverse primer: AG 
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CCCG CG GTTATTTGTATAGTTC ATCC ATG CC ATG (SEQ ID NO:58)). 
The E. coli expression plasmids were pIH976-aadE-N and 
pAGR- N c -smGFP, under the control of Ptac promoter of the 
host. Either of the plasmids or both together were 
5 transformed into Exoli ER1992 (New England Biolabs, Inc., 

Beverly, MA) and plated on LB agar-Ampicillin plates as well as 
LB agar ampicillin and spectinomycin plates. 



For Western blotting, E.coli cell extracts were mixed with 
10 SDS loading dye with 1 mM DTT, boiled at 95°C for 5 min and 

loaded on a 10-20% Tris-glycine-SDS gradient gel. The 
proteins were blotted on an Immobilin-P membrane and 
probed with an anti-GFP monoclonal antibody (Roche 
Molecular Biochemicais, Indianapolis, IN) followed by 
15 chemiluminescent detection of the GFP and aadA-GFP fusion 

protein. 



EXAMPLE V 

20 Utilization of plant promoters in E.co/i for trans-splicing of 

two unrelated gene product, aminogIycoside-3- 
acetyltransferase (aadA) and soluble modified green 
fluorescent protein (smGFP) to give a functional hybrid 

protein. 

25 

The above DNA fragments were cloned downstream of 
the chloroplast specific promoter PpsbA (SEQ ID NO: 59). A 
terminator sequence of the same gene (TpsbA (SEQ ID 
NO:60) was placed down stream of the cloned gene. The two 
30 genes were expressed in opposite direction to avoid read 
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through. The plant promoters were functional upon 
transformation in to E.coii and trans-spliced products (aadA- 
smGFP fusion protein, 57 kDa) were observed in Western biot 
assay using anti GFP antibodies. Thus chloropiast specific 
5 promoters are functional in E.coii and could be used for gene 

expression studies. 

The following protocol describes the production of a 
f.co///plant shuttle vector (pNCT114/pl\ICT224) that is capable 
10 of homologous recombination of a transgene(s) in vivo. 

A shuttle vector consists of elements that will make it 
functional in both E.coii as well as plant cell. Plasmid 
pLITMUS28 (New England Biolabs, Inc., Beverly, MA) is the 

15 backbone for the pNCT114 and pNCT224 gene targeting 

vector. The vector DNA comprises, at least (1) two DNA 
sequence homologous to the plastid genome (also referred 
as targeting sequence/fragment), (2) one or more promoter 
element, (3) transcription terminator elements, and (4) one or 

20 more selectable/drug resistance (non-lethal) marker gene. 

Promoter element (PpsbA) DNA sequences were PCR 
amplified from genomic DNA extracted from 7 days old tobacco 
seedlings using the CTAB method as described by Murray and 
25 Thompson (Nucleic Acids Res., 16:4321-4325 (1980)). The 

primers used for amplification are listed (PpsbA forward 
primer: AACTGCAGGAATAGATCTACATACACCTTGG (SEQ ID 
NO:64), PpsbA reverse primer: CCGCTCGAGCTTAATTAAGGTAA 
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AATCTTGGTTTATTTAATC (SEQ ID NO:65)). Similarly the 
terminator sequence (TpsbA) was amplified by PCR and 
cloned. The primers used for amplification are listed (TpsbA 
forward primer: GCGACCGGTGATCCTGGCCTAGTCTATAGGAGG 

5 (SEQ ID NO: 66), TpsbA reverse primer: AGGCCTAGGAGAATACT 

CAATCATGAATAAATGC (SEQ ID NO:67)). A vector with a psbA 
promoter and terminator DNA sequence allows genes to be 
cloned in between these for expression of the protein. The 
targeting DNA sequences were amplified and inserted outside 

10 of the promoter and terminator in a flanking manner (Figure 

20), thus facilitating homologous recombination of the trans- 
gene at a predetermined loci. pNCT114 contains 16SrDNA- 
trnaV and rps7/12 targeting sequence (SEQ ID NO:61), 
whereas, pNCT224 contains orf228-ssb as left border and 

15 orfl244 as right border (SEQ ID NO:62). The following primers 

were used for PCR amplification of the targeting sequences. 

Primers for pNCT114 
Left border forward primer: 
20 TTGGCGCGCTTGACGATATAGCAATTTTGCTTGG (SEQ ID NO: 68) 

Left border reverse primer: 

TTGCGTACGATTTATCTCAGATTAGATGGTCTAG (SEQ ID NO: 69) 



25 Right border forward primer: 

TTGCCTAGGCGTATTGATAATGCCGTCTTAACCAG (SEQ ID NO: 70) 
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10 



Right border reverse primer: 

AGGGGTACCGAATTCAAGATTCTAGAGTCTAGAG (SEQ ID NO: 71) 

Primers for pNCT224 

Left border forward primer: 

TTGGCGCGCAATTCACCGCCGTATGGCTGACCGG (SEQ ID NO:72) 
Left border reverse primer: 

TTGCGTACGCCTTTGACTTAGGATTAGTCAGTTC (SEQ ID NO: 73) 
Right border forward primer: 

TTGCCTAGGGTCGAGAAACTCAACGCCACTATTC (SEQ ID NO: 74) 



Right border reverse primer: 
15 AGGGGTACCATCACGATCTTATATATAAGAAGAAC (SEQ ID NO:75) 

A detailed diagram for pNCTl 14/224 is in Figure 20A. 
Both the plasmids contain two promoters and two terminator 
DNA fragments. For directional cloning, unique restriction 

20 enzyme sites are incorporated. Plasmid pNCT114 and 

pNCT224 have unique restriction enzyme sites (Pmel-Agel and 
Pacl-Xhol sites). Insert from plasmid pAEN (aadA gene in frame 
with IN n ) was obtained by digesting with Pacl-Xhol and 
pGFPEC (smGFP in frame with IN C ) was obtained by digesting 

25 with Pmel-Agel and ligated sequentially into pNCT114 or 

pNCT224. The plasmids are designated as pll5ag and 
p225ag (Figure 21A). The plasmids were transformed into 
E.cofi and selected with ampicillin and spectinomycin (Figure 
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21B). The cell extracts were made from overnight cultures and 
separated on 10-20% Tris-glycine-SDS gradient gel. The 
proteins were blotted on an Immobiiin-P membrane and 
probed with an anti-GFP monoclonal antibodies (Roche 
5 Molecular Biochemicais, Indianapolis, IN) followed by 

chemiluminescent detection of the GFP and aadA-GFP fusion 
proteins (Figure 21C). 

EXAMPLE VI 

Cis-splicing of the EPSPS and ALS gene products in plant 
cytoplasm expressed from a DNA cassette integrated into 

molecular DNA 

!5 The introduction of DNA into plant nuclei has been 

achieved in many different ways, such as, electroporation, 
polyethylene glycol mediated, Agrobacterium mediated, 
microinjection and biolistic transformation. In accordance with 
the present invention, one should determine if the plant 

20 cytoplasm will mediate protein-splicing event in cis or trans. 

This will be a prerequisite for further trans-splicing 
technologies in plants. This technique will be useful if the 
target protein needs specific cytoplasmic modification for 
activity. Either of the above techniques may be employed to 

25 introduce the EPSPS and /or ALS gene cassettes into tobacco 

or any other suitable plant tissue or cells. The general 
cassette consists of: (1) Drug selection/degrading marker 
gene such as kanamycin or any other suitable selection 
marker; (2) a strong promoter element such as 35sCMV 
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(cauliflower mosaic virus); and (3) right and left border T DIMA 
repeats of Agrobacterium. Such a cassette could be 
introduced into plants either by a biolistic process or by 
Agrobacterium mediated gene transfer (Horsch, et al., Nature 
227:1229-1231 (1985)). The cassette is based on pBI121 
gene transfer vector (Jefferson, et al., EMBO J., 6:3901-3907 
(1987)). The design of the final cassette is illustrated in Figure 
22. 

In the biolistic process, the transforming DNA is coated 
on the surface of fine gold particles and introduced into the 
plant cell by a particle accelerator gun (PDSIOOO/He gun, 
Biorad, Richmond, CA). For Agrobacterium mediated gene 
transfer the transforming DNA cassette is introduced into the 
bacteria. The Agrobacterium harboring the cassette is allowed 
to be in contact with a disk or tissue section from tobacco or 
other suitable plant leaves. This facilitates the transfer of the 
DNA cassette to the plant nuclei. In either of the above 
approaches, the DNA finally gets integrated into the plant 
nuclei. The putative transformed cells are used for marker 
gene (drug) selection. The plants regenerated in presence of 
the selected drugs are strong transgenic candidates. After 
the plants are mature, the cell extracts will be taken and 
mixed with SDS loading dye with 1 mM DTT, boiled at 95°C for 
5 min and loaded on a 10-20% Tris-glycine-SDS gradient gel. 
The separated proteins will be blotted on an Immobiiin-P 
membrane and probed with an anti-ALS or EPSPS antibody. 
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PCR may then be performed to determine if the gene has 
integrated in a predictable fashion without rearrangement. 

This technique would be useful for proteins that need 
5 specific modification for activity/folding in cytoplasmic 

environment. A part of the target protein gene with 
necessary transport signal and splicing elements will be 
placed in an organelle for cytoplasmic transport in the form of 
a precursor polypeptide. 

10 

These plants are allowed to grow in the greenhouse till 
they mature and the seeds will be collected. The collected 
seeds are then germinated and Fl plants tested for herbicide 
resistance. A small-scale trial may be done to see whether or 
15 not the segregation pattern of the introduced transgenes 

follows a Mendelian inheritance pattern. Integration into 
nuclear DNA would yield Mendelian inheritance, whereas 
integration into chloroplast DNA woute yield non-Mendelian 
maternal inheritance. 

20 

EXAMPLE VII 

Trans-splicing of a split gene, such as EPSPS/ALS or of two 
unrelated gene products, such as aminoglycoside-3- 
25 acety transferase (aadA) and soluble modified green 

fluorescent protein (smGFP), to give a functional hybrid 
protein in plant chloroplast. 



The aim of these experiments is to investigate if trans- 
splicing is feasible in plant chloroplasts. Plant chloroplasts are 
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similar to bacteria with respect to their transcription and 
translation machinery. In Examples IV-VI, we have used the 
naturally occurring intein from the dnaE gene of the 
Synechocystis species PCC6803, which is a cyanobacterium. 
5 Cyanobacteria are photosynthetic bacteria which are similar 

to plant chloroplasts. Thus it should be possible for inteins to 
splice or trans-splice in plant chloroplasts. These proposed 
experiments are in two sections: Section 1, To demonstrate 
the trans-splicing event of two unrelated gene products aadA 

10 and smGFP in plant chloroplasts, where both genes are 

integrated in chloroplast genome; and Section 2, Trans- 
splicing in chloroplast, where the smGFP gene cassette is 
integrated into the nuclear genome and the translated 
protein containing a transit peptide (rubisco 3A-IIM c -smGFP) is 

15 imported into the chloroplast for the reaction to proceed. The 

chloroplast will have aadA gene fused to IN n fragment. The 
detailed protocol is narrated below. 



To demonstrate trans-splicing of two unrelated gene 
20 products, aminogiycoside-3-acetyltransferase (aadA) and 

soluble modified green fluorescent protein (smGFP) in 
chloroplast, upon transcription and translation in 
chloroplasts. 

25 The plasmids are designated as pllSag and p225ag as 

in Example V. These piasmids will be delivered into plant 
organelles using a biolistic device. Tobacco or any other 
suitable plant tissue will be harvested aseptically from sterile 
greenhouse grown plants or tissue culture plant cells. Plant 
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tissue will be equilibrated overnight with plant growth 
medium and sorbitol or any other suitable osmoticum. The 
plant cells will be bombarded with the above plasmids coated 
on gold particles. After a suitable recovery time the cells will 

5 be placed on plant growth medium along with phytohormone 

and spectinomycin sulphate 500ug/ml. The spectinomycin 
resistant callus tissue will be harvested and will be placed on 
shoot differentiation medium. When shoots are about 2 cm 
length they will be dissected out and put in the rooting 

10 medium. The transgenic plant or sector of the plants will be 

identified by hand held UV lamp (a normal (non-transgenic) 
plant will fluorescent red in UV, whereas, a transgenic plant 
will look green). The transgene integration and copy numbers 
will be verified by Southern blot analysis and PCR. The 

15 transgenic sectors will be tested for trans-splicing of aadA 

and smGFP using anti GFP antibody. These sectors would 
further be used for generating a pure trans-plastomic line. 
The Fl plants will be tested for spectinomycin resistance. 

20 Trans-splicing in the chloropiast. The smGFP gene 

cassette is integrated into the nuclear genome and the 
translated protein containing the transit peptide of the 
rubisco 3A-IN c -smGFP is imported into the chloropiast for the 
reaction to proceed. 

25 

This method will enable any split protein (e.g., EPSPS or 
ALS) to be expressed as fused proteins with either IN n or 
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IN C either in chloroplasts or the nucleus. The nuclear-encoded 
component will be fused to a chloroplast transit peptide to 
facilitate its migration into the chloroplast after translation in 
the cytoplasm. A detailed method for aadA and GFP is given 
below. Similar methods could be followed for any other 
protein/split genes. 

This method will require a nuclear transformation vector, 
such as pBI121, carrying a drug selection marker and the 
target gene of interest. Our experimental gene will be a three 
part fusion protein with rubisco transit peptide followed by 
IN C and smGFP (in place of smGFP another protein/peptide 
such as half of EPSPS or ALS could be substituted). The transit 
peptide is codon optimized for tobacco (Figure 26). This fusion 
gene will be under the control of a strong plant promoter, 
35SCMV. A diagram of such cassette is shown (Figure 23). 
This DNA will be introduced into the plant nucleus. The stable 
transgenic lines will be selected and Fl progeny will be tested 
for transgene integration. 

Leaf sections from the above transgenic plants will be 
used for chloroplast DNA transformation. The chloroplast gene 
targeting vectors are based on pi 14 and p224 with 
spectinomycin resistance gene and a PpsbA promoter to drive 
the transgene. The transgenes could be the other half of the 
protein (that was introduced to the nuclear genome 
previously) with the necessary splicing elements. As a model 
system we would use the aadA-IN n fusion gene for 
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chioroplast transformation. The transplastomic lines will be 
selected using both drugs (e.g., the chioroplast specific drug 
spectinomycin and the nuclear specific drug kanamycin). PCR 
and Western blot analysis will further establish pure plant 
lines. 

For the transgenic plants the Fl generation will be 
tested for: (1) Mendeiian inheritance pattern of the 
transgene/segment; (2) stability of the transgene; and (3) 
possible escape of the transgene through pollen. 

ALS/EPSPS transgenic plants will be tested for 
resistance to sulphonyl urea and Roundup®. 

It should be understood that the Examples and 
embodiments described herein are for illustrative purposes 
only and that various modifications or changes in light thereof 
will be apparent to persons skilled in the art and are to be 
included within the spirit and purview of this Application and 
the scope of the appended claims. 
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WHAT IS CLAIMED IS: 



1. A method of reconstituting a target protein in a 
predetermined location within an organism comprising: 

5 (a) splitting DNA coding for the target protein into at 

least two fragments; 

(b) separating the DNA fragments of step (a) to 
prevent transmission of the gene coding for the target protein 
to other organisms; 
10 (c) expressing the DNA fragments of step (b) within 

the organism to produce the corresponding fragments of the 
target protein; and 

(d) reconstituting the target protein from the protein 
fragments. 

15 

2. A method preventing transmission to other organisms of 
the gene coding for a target protein from within an organism 
containing said DNA coding for the target protein comprising: 

(a) splitting DNA coding for the target protein into at 
20 least two fragments; and 

(b) separating the DNA fragments of step (a) to 
prevent transmission of the gene coding for the target 
protein. 



25 



3. The method of claim 1 or 2, wherein the organism is 
selected from the group consisting of plants, animals, fungi, 
viruses, prokaryotes, and single-cell eukaryotes. 
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4. The method of claim 1 or 2, wherein the DNA coding for 
the target protein is split by DNA coding for one or more 
inteins or portions thereof. 

5. The method of claim 4, wherein the DNA coding for the 
target protein is split by forming at least two DNA fusion 
fragments, wherein said DNA fusion fragments comprise a 
portion of the DNA coding for the target protein and a portion 
of DNA coding for the intein. 

6. The method of claim 5, wherein one of said fusion 
fragments is formed by linking the C-terminai end of DNA 
coding for an N-terminal portion of the target protein to the N- 
terminal end of the DNA coding for an N-terminal portion of 
the intein, and another of said fusion fragments is formed by 
linking the N-terminal end of DNA coding for a C-terminal 
portion of the target protein to the C-terminal end of DNA 
coding for a C-terminal portion of the intein. 

7. The method of claim 1 or 2, wherein the DNA coding for 
the target protein is split to form two or more DNA fragments 
by DNA coding for one or more affinity domains. 

8. The method of claim 7, wherein the affinity domain is 
selected from the group consisting of inteins or intein 
fragments, leucine zipper and c-Jun/c-Fos. 
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9. The method of claim 1 or 2, wherein the DNA fragments 
coding for the target protein are separated by 
compartmentalizing each DNA fragment into different 
compartments selected from a group consisting of the 

5 nucleus, a membrane bound organelle, a piasmid, a virus, a 

cosmid, and an artificial chromosome. 

10. The method of claim 9, in which at least one of the DNA 
fragments coding for the target protein is fused to a DNA 

10 sequence encoding transit peptides such that the protein 

products of the DNA fragments are transported into a single 
compartment where functional reconstitution can occur. 



11. The method of claim 10, in which one of the DNA 
15 fragments coding for a portion of the target protein is 

compartmentalized in the nucleus, being fused to a DNA 
sequence encoding a transit peptide for transport into 
chtoroplasts, and the other DNA fragment coding for another 
portion of the target protein is compartmentalized in the 
20 chloroplasts. 

12. The method of claim 1 or 2, wherein the DNA fragments 
coding for the target protein are separated by inserting each 
of the fragments into different portions of a DNA molecule 

25 wherein the DNA molecule is selected from the group 

consisting of DNA from the nucleus, a membrane bound 
organelle, DNA from a piasmid, DNA from a cosmid, DNA from a 
virus and DNA from an artificial chromosome. 
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13. The method of claim 12, wherein at least one of the DNA 
molecules is naturally inherited. 

14. The method of claim 12, wherein at least one of the DNA 
molecules resides in the chloroplasts. 

15. The method of claim 12, wherein at least one of the DNA 
molecules resides in the mitochondria. 

16. The method of claim 4, wherein reconstitution of the 
target protein fragments comprises intein-mediated splicing. 

17. The method of claim 4, wherein reconstitution of the 
target protein fragments comprises intein-mediated protein 
complementation. 

18. The method of claim 1, wherein reconstitution of the 
target protein fragments comprises protein complementation. 

19. The method of claim 18, wherein protein 
complementation occurs in the presence of an affinity domain. 

20. The method of claim 18, wherein protein 
complementation occurs in the absence of an affinity domain. 

21. The method of claim 1 or 2, wherein splitting of the DNA 
coding for the target protein comprises: 
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(a) determining one or more potential split site 
regions of the target protein; and 

(b) splitting the DNA coding for the target protein at 
the potential split site region. 

22. The method of claim 21, wherein the potential split site 
region of the target protein is determined by analyzing 
primary amino acid sequence of the target protein for non- 
conserved regions. 

23. The method of claim 21, wherein the potential split site 
region is determined by linker tolerance of linker insertion 
within the target protein. 



24. The method of claim 21, wherein the potential split site 
region is determined by analyzing the structure of the target 
protein for the presence of flexible loops. 



25. The method of claim 21, wherein the potential split site 
20 region is determined by analyzing the structure of the target 

protein for the presence of amino acid sequence between 
folding domains of the target protein. 



26. An isolated DNA fragment comprising a DNA split site in 
25 an EPSPS gene. 

27. The isolated DNA fragment of claim 26, wherein the DNA 
fragment is selected from the group consisting of the DNA 
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encoding for amino acids 1-235 or portions thereof, SEQ ID 
NO:24, SEQ ID NO:25, SEQ ID NO:28, SEQ ID NO:29, SEQ ID 
NO:30, SEQ ID NO:31, SEQ ID NO:36, SEQ ID NO:37, SEQ ID 
NO:38 and SEQ ID NO:39. 

5 

28. An isolated DNA fragment comprising a DNA split site in 
an E. coii ALS gene. 



29. The isolated DNA fragment of claim 28, wherein the DNA 
10 fragment is selected from the group consisting of SEQ ID 

NO:10, SEQ ID NO:ll, SEQ ID NO:12 and SEQ ID NO:13. 

30. An isolated DNA fragment comprising a DNA split site in 
a maize ALS gene. 

15 

31. The isolated DNA fragment of claim 30, wherein the DNA 
fragment is selected from the group consistin of SEQ ID 
NO:17, SEQ ID NO:17, SEQ ID NO:18 and SEQ ID NO:19. 



20 



32. The isolated DNA fragments of claim 26, 28, or 30, 
wherein said DNA fragment is fused to DNA coding for an 
intein or portion thereof. 
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FIG. IB 
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FIG. 2B 
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FIG. 6 
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FIG. 15-1 



EPSPS Insertion Site 
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Amino acid sequence inserted Clone 

CLNIQ pCE-5aa 129 

VFKHA pCE-5aa 47 

LFKGP pCE-5aa 7 

CLNSD pCE-5aa 50 

CLNIS pCE-5aa 8 

CLNTO pCE-5aa 44 

CLNNR pCE-5aa 10 

CLNSC pCE-5aa 32 

CLNSD pCE-5aa 5 

CLNTL pCE-5aa 3 

VFKQP pCE-5aa 12 

CLNSM pCE-5aa 42 

CLNNY pCE-5aa 37 

CLNTL pCE-5aa 22 

CLNHA pCE-5aa 11 

VFKHK pCE-5aa 112 

CLNTK pCE-5aa 212 

CLNKD pCE-5aa 33 

MFKQI pCE-5aa 151 

CLNII pCE-5aa 114 

LFKHE pCE-5aa 227 

VFKHF pCE-5aa 1G2 

CLNSV pCE-5aa 1 

VFKQI pCE-5aa 2 

MFKQA pCE-5aa 208 

LFKHH pCE-5aa 28 

LFKHQ pCE-5aa 4 

MFKHV pCE-5aa 203 

VFKOK pCE-5aa 25 

LFKQQ pCE-5aa 102 

LFKHS pCE-5aa 40 

CLNTG pCE-5aa 35 

CLNSR pCE-5aa 23 

VFKHL pCE-5aa 154 
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FIG. 15-2 
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FIG. 17 
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FIG. 18 




EXPRESSION PLASMIO pAGR3 : 5910 bp. 
PROMOTER AND CLONING SITE MAP: 

lac operator 

1 6AATTGT6A6 C6CTCACAAT TCTAGGATGT TAATTGCGCC GACATCATAA 



-35 region 

51 CGGTTCTGGC AAATATTCTG AAATGAGCTG TTGACAATTA ATCATCGGCT 



-10 region lac operator rbs 
101 CGTATAATGT GTGGAATTGT GAGCGGATAA CAATTTCACA CAGGAAACAG 



start 

151 ACCAJGGTGA ATTCTAGAGC TCGAGGATCC GCGGTACCCG GGCATGCATT 
Ncol EcoRI Xbal SacI Xhol BamHI SacII Kpnl Seal EstBI 



201 CGAAGCTTCC TTAAGCGGCC GTCGACCGAT GCCCTTGAGA GCCTTCAACC 
Hindlll Aflll EagI Sail 
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FIG. 24 

GMTAGATCTACATACACCTTGGTTGACACGAGTATATMGTCATGTT 
ATACTGTTGAATAACAAGCCTTCCATTTTCTATTTTGATTTGTAGAAA 
ACTAGTGTGCTTGGGAGTCCCTGATGATTAAATAAACCAAGATTTTAC 
CTTAATTAAG 



FIG. 25 

GATCCTGGCCTAGTCTATAGGAGGTTTTGAAAAGAAAGGAGCAATAAT 
CATTTTCTTGTTCTATCAAGAGGGTGCTATTGCTCCTTTCTTTTTTTC 
TTTTTATTTATTTACTAGTATTTTACTTACATAGACTTTTTTGTTTAC 
GTATTCT 



FIG. 26 

catATGGCgTCcATGATcTCCTCgTCcGCgGTGACcACgGTCAGCCGcG 
CgTCcACGGTGCAgTCGGCCGCGGTGGCcCCgTTCGGCGGCCTCAAgTC 
CATGACcGGcTTCCCgGTcAAGAAGGTCAACACgGACATcACgTCCATc 
ACgAGCAAcGGcGGcAGgGTgAAGTGCATGcgaagagc 
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FIG. 27-1 

GTTAACTACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCC 
CTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATG 
AGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAG 
TATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGG 
CATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTA 
AAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACT 
GGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAAC 
GTTCTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTA 
TTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACA 
CTATTCTCAGAATGACTT6GTTGAGTACTCACCAGTCACAGAAAAGC 
ATCTTACGGATGGCATGACAGTAAGAGAATTATGCAGTGCTGCCATA 
ACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGG 
AGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATG 
TAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCA 
AACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTT 
GCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAAC 
AATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTG 
CGCTCG6CCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGC 
CGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATG 
GTAAGCCCTCCCGTATCGTAGTTATCTACACGACGGGGAGTCAGGCA 
ACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACT 
GATTAAGCATTGGTAACTGTCAGACCAAGTTTAC7CATATATACTTT 
AGATTGATTTACCCCGGTTGATAATCAGAAAAGCCCCAAAAACAGGA 
AGATTGTATAAGCAAATATTTAAATTGTAAACGTTAATATTTTGTTA 
AAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATA 
GGCC6AAATCGGCAAAATCCCTTATAAATCAAAAGAATAGCCCGAGA 
TAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAG 
AACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTCTATCAGGGCGA 
TGGCCCACTACGTGAACCATCACCCAAATCAAGTTTTTTGGGGTCGA 
GGTGCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGCCCCCGATTT 
AGAGCTTGACGGGGAAAGCGAACGTGGCGAGAAAGGAAGGGAAGAAA 
GCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGTCACGCT 
GCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCG 
CGTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAA 
AATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAG 
AAAA6AT CAAAGGATCTTC TTGAGATCCTTTTTTTC TGC6CGTAATC 
TGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTT 
GCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCA 
GCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTA 
GGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCT 
GCTAATCCTGTTAC 



SUBSTITUTE SHEET (RULE 26) 



WO 00/71701 



27/33 



09/936588 

PCT/USOO/14122 



FIG. 27-2 

CAGTGGCT6CT6CCAGTGSC8ATAAGTCGTGTCTTACC6G6TT66A 
CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACG 
GGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCG 
MCTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCC 
CGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGA 
ACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATC 
TTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATT 
TTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGC 
AACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTC 
ACATGTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTAC 
ACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATA 
ACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCTA 
CGTAATACGACTCACTAGTGGGCAGATCTTCGAATGCATCGCGCGC 
TTGACGATATAGCAATTTTGCTTGGATTTATCAGTCGAAGCAGGAG 
ACAATATACCTTGATATTCTCGATCATTCTTTGATTCAAAGCATCG 
TTCCATCTCAATTGAAAAAGCAAATAACGTTTCAAGAACAAATCTA 
GTTCTGCTTCCGTGTTGCTTTTGTATTGTTTTTTCTTTTTACCCTT 
CTTTGTGTCTGATTCCGCGTAATCTTTTTTAAGAGCGTTTTGATGT 
TTTGAGAGAACAGGGCCCAGATTTCCTTTGTTTTCTATATCTGATC 
CACGCTCTTTTTCTCCTTGACTTGCGGGTTCTTTTGCTTCTTGAAT 
TCGATTCTTTATTTTTTTATTTSATC6TAGAAAAAAGTTTTGTTTT 
TGGTTTTTATTGATGTTTTTATTTTGACTAACATTTTCATTTGTAT 
TCAAATTTAAAAGAAGTAATTTGCTTGGTATAATCCACGGTTTTAT 
TTTATATACATTATAAAGTGGTACAAATTCTGGGAAGAACCAAAAT 
TCCAGATTCAATATGGGACGATTTAATATTTTTTCATTCATTCCCA 
TCCMTCAAAAAAGGCTTTTTTCGAATTTTTTTGATTGTTTTCTGG 
ATTTTGATGAATCGTAAGATAAAAAAAGCCTTTTTTATCAATTTTA 
TCAATTATTTGATAATTATTAATACCAATTTTAGTATTTGGATTAC 
TGTTGGTATCGATCTTAACCCAGGC CTCAATATCTTCTTTTTGT CT 
AAGAGAAAAATGGATAATTTTCCAATCAAAATATTTTCTATCGAGA 
TTTCTTTCTATATATAGAATATTGCCTTTTCTTAGATAATTATTGA 
TATGAAGATTGCCGAGCATATCAAAAAGGTTGTGTTTGGACGTGTT 
GGAATTAGAAGAAATTTCGAGGTTCTTATTTACTTGAMGGGTAAT 
CTAGAAATAAAAGAGTCATTTTTTTTTTCATAATTAATCGATTTAT 
ATGCTAAAAGATCATATCTATAACATTTTTGAAAATTATCTTTTTG 
6TTTGCTAATGAATAGAGCTCAGAATCATTTTCTTTTTTGTAATGA 
ATTAATTGGTCTTTTTCATATGAATTCCATTTGTTTAAATTTCGAT 
TTTGAGCCATACAACCTTGATTAACCCTATTTCGCCATTTTTGTGG 
CATTAATCTAGACC ATCT AATCTGAGATAAATCGTACGag aa t a c t 
caa t CATGAATAAATGCAAGAAAATAACCTCTCCTTCTTTTTCT AT 
AATGTAAACAAAAAAGTCTATGTAAGTAAMTACTAGTAMTAAAT 
AAAAAGAAAAAAAGAAAGGAGCAATAGCACCCTCTTGATAGAACAA 
GAAAATGATTAT 
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FIG. 27-3 

TGCTCCTTTCTTTTCAAMCCTCCTATAGACTAGSCCAGGATCCTCGA 
GcttaattaaGGTAAAATCTTGGTTTATTTAATCATCAGGGACTCCCA 
AGCACACTAGTTTTCTACAAATCAAAATAGAAAATAGAAAATGGAAGG 
CTTTTTATTCAACAGTATAACATGACTTATATACTCGTGTCAACCAAG 
GTGTATGTAGATCtattcCTGCAGGATATCTGGATCCACGMGCTTCC 
CATGGGAATAGATCTACATACACCTTGGTTGACACGAGTATATAAGTC 
ATGTTATACTGTTGAATAAAAAGCCTTCCATTTTCTATTTTGATTTGT 
AGAAAACTAGTGTGCTTGGGAGTCCCTGATGATTAAATAAACCAAGAT 
TTTACCGTTTAAACACCGGTGATCCTGGCCTAGTCTATAGGAGGTTTT 
GAAAAGAMGGAGCAATAATCATTTTCTTGTTCTATCAAGAGGGTGCT 
ATTGCTCCTTTCTTTTTTTCTTTTTATTTATTTACTAGTATTTTACTT 
ACATAGACTTTTTTGTTTACATTATAGAAAAAGAAGGAGAGGTTATTT 
TCTTGCATTTATTCATGATTGAGTATTCTcctaggCGTATTGATAATG 
CCGTCTTAACCAGTTTTTCCATTGATTGATTCTATAACTCTGAAGTTT 
CTTATGTTTTAATTCAGAATGAAATATTCCTAGTGTTCGAAAATAGTC 
CTTTATTTTAGTCTTAAGGAAAAAAGACGTTCTGTTATATTGAAGAAC 
AGATCTTAATTTAGACAAATTAATAACTTGGGGTTGTGATAATTTGTA 
AAATACATATGCTTGTGATAAGTAGGATAAATCAAAAAAAATATGTGA 
ATTTTTCTTACTAATATTATAAAGTGACTTTTTTATAGTCGAAATAAA 
GTGMTTTnnTTGATTAnMTTTTTTCTTGATTTATTTCATTATT 
GGAMTGTATTTATCAATCAATTTGTTTGTTGATTCAAGAAAGAGTTG 
TGTATTAATTCTGGGAATATTAATGATAGATAAAAATAGATCGATGTA 
TAATCTTTGAATGAATAATTTTAGAAAATAATGGAATTTCCATATTAA 
TCGAGTATTTCTTCTTTTTAATATTTGGAAAATCTTTTTTGGCGATTC 
GAATTTTTTAATATTATTTGTTTTATTAGGACTAATGTCTATTTCTGG 
AGTTACTTTCTTTTTCTCTTTTGTAATTCTTTCTATTTGATTTTTGAT 
TGTACTTGTTCTATCAGTCAAATCCTTCATTTTGCTTTCTATCAGTGA 
AGAATTTGGCCAATTTCCAGATTCAATTTGACTAAATGATTCGTTAAT 
TATCTGATTACTCATTAGAGAATCTTTTTCTTTTTTCGTTTCATTCGA 
TTCATCTATTTCTTTGAGTCTAAATAATACAATTGGATTTACTTTTGA 
AAGTTCTTTTTTCATTTTTTTTATAAATAGACTACTTTTGATAAGCCA 
TTTTTTGGTTTCTTTTGAAATTCTTCGAAATAATTTTATTTTTCCTTT 
GAAAACTTTTAGAGTTATAAAATATTTCTTTTTGAATTTTCCAATTTT 
TTTTTCGAGTTCCTTAAAAATGGGCTCAAAAAAAGAAGGGCGTTTTCG 
GGGAGAACCAAAGGGAAGTTCAGCTTCCATTCCCCAAACTGTTAAAAA 
ACAAAMTCATCTTTTTGTTTTTTCTTTTTCATTAGCTCTCCACGGGA 
GGAGTACAGTTTAGATATATGCCAAGGTTTCAGACAAAAAGGAAATAA 
TATTTTGATCTGAATGCCATCTTTCAACCAATTTTTTGGAAATTCTGT 
TTCTGATAATTGAACACCATTATAAGTACATTTAATATGCATTTCTCT 
ATTCCATTCCTGCAAATCTTCAGACCATTCAGGAAGTTGCAAGACTAA 
CATACGCCCGAGATTTTTGGCTATTATCAATGAAGGTAATACAATATA 
TTTTCGAAGAATTG 
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FIG. 27-4 

ATTGAGTTATTAACATGTAACCTCTTATTATTT6CGCAAAA6GAATGGT 
ATCCCAGGCTTCTGCTATCTCTATCCGTGCTTTTTCCTTTCTTTTGTTC 
TCCCCTTTTTTGTCCTTTTCCTTTTTCTCTTCTCTTTTTGTTTGTTCTT 
CTCTAGACTCTAGAATCTTGAATTCGGTACCCTCTAGTCAAGGCCTTAA 
GTGAGTCGTATTACGGACTGGCCGTCGTTTTACAACGTCGTGACTGGGA 
AAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTC 
GCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAAC 
AGTTGCGCAGCCTGAATGGCGAATGGCGCTTCGCTTGGTAATAAAGCCC 
GCTTCGGCGGGCTTTTTTTT 
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FIG. 28-1 

GTTMCTACGTCAGGTGGCACTTTTCGGGGAMTGTGCGCGGMCC 
CCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTCA 
TGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAA 
GAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTT 
GCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGA 
AAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACAT 
CGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCC 
GAAGAACGTTCTCCAATGATGAGCACTTTTAAAGTTCTGCTATGTG 
GCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCG 
CCGCATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTC 
ACAGAAAAGCATCTTACGGATGGCATGACAGTAAGAGAATTATGCA 
GTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCT 
GACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAAC 
ATGGGGGATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGA 
ATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAGC 
AATGGCAACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACT 
CTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAG 
TTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTAT 
TGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATT 
GCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCGTAGTTATCT 
ACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGAT 
CGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGAC 
CAAGTTTACTCATATATACTTTAGATTGATTTACCCCGGTTGATAA 
TCAGAAAAGCCCCAAAAACAGGAAGATTGTATAAGCAAATATTTAA 
ATTGTAAACGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTT 
AAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCC 
TTATAAATCAAAAGAATAGCCCGAGATAGGGTTGAGTGTTGTTCCA 
GTTTGGAACAAGAGTCCACTATTAAAGAACGTGGACTCCAACGTCA 
AAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCACTACGTGAACC 
ATCACCCAAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTA 
AATCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAA 
AGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGC 
GCTAGGGCGCTGGCAAGTGTAGCGGTCACGCTGCGCGTAACCACCA 
CACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTAAAAGGATCT 
AGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACG 
TGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAA 
GGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGC 
AAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCA 
AGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCG 
CAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCACC 
ACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAAT 
CCTGTTAC 
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FIG. 28-2 

CAGTGGCTGCTGCCA6TGGCGATAAGTCGTGTCTTACCGSGTTG6A 
CTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACG 
GGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCG 
AACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCC 
CGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGA 
ACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATC 
TTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATT 
TTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGC 
AACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTC 
ACATGTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTAC 
ACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATA 
ACMTTTCACACAGGAMCAGCTATGACCATGATTACGCCAAGCTA 
CGTAATACGACTCACTAGTGGGCAGATCTTCGAATGCATCGCGCGC 
AATTCACCGCCGTATGGCTGACCGGCGATTACTAGCGATTCCGGCT 
TCATGCAGGCGAGTTGCAGCCTGCAATCCGAACTGAGGACGGGTTT 
TTGGGGTTAGCTCACCCTCGCGGGATCGCGACCCTTTGTCCCGGCC 
ATTGTAGCACGTGTGTCGCCCAGGGCATAAGGGGCATGATGACTTG 
ACGTCATCCTCACCTTCCTCC66CTTATCACC6GCA6TCT6TTCAG 
GGTTCCAAACTCAACGATGGCAACTAAACACGAGGGTTGCGCTCGT 
TGCGGGACTTAACCCAACACCTTACGGCACGAGCTGACGACAGCCA 
TGCACCACCTGTGTCCGCGTTCCCGMGGCACCCCTCTCTTTCAAG 
AGGATTCGCGGCATGTCAAGCCCTGGTAAGGTTCTTCGCTTTGCAT 
CGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATT 
CCTTTGAGTTTCATTCTTGCGMCGTACTCCCCAGGCGGGATACTT 
AACGCGTTAGCTACAGCACTGCACGGGTCGATACGCACAGCGCCTA 
GTATCCATCGTTTACGGCTAGGACTACTGGGGTATCTAATCCCATT 
CGCTCCCCTAGCTTTCGTCTCTCAGTGTCAGTGTCGGCCCAGCAGA 
GTGCTTTCGCCGTTGGTGTTCTTTCCGATCTCTACGCATTTCACCG 
CTCCACCGGAAATTCCCTCTGCCCCTACCGTACTCCAGCTTGGTAG 
TTTCCACCGCCTGTCCAGGGTTGAGCCCTGGGATTTGACGGCGGAC 
TTAAAAAGCCACCTACAGACGCTTTACGCCCAATCATTCCGGATAA 
CGCTTGCATCCTCTGTATTACCGCGGCTGCTGGCACAGAGTTAGCC 
GATGCTTATTCCCCAGATACCGTCATTGCTTCTTCTCCGGGAAAAG 
AAGTTCACGACCCGTGGGCCTTCTACCTCCACGCGGCATTGCTCCG 
TCAGCTTTCGCCCATTGCGGAAAATTCCCCACTGCTGCCTCCCGTA 
GGAGTCTGGGCCGTGTCTCAGTCCCAGTGTGGCTGATCATCCTCTC 
GGACCAGCTACTGATCATCGCCTTGGTAAGCTATTGCCTCACCAAC 
TAGCTAATCAGACGCGAGCCCCTCCTCGGGCGGATTCCTCCTTTTG 
CTCCTCAGCCTACGGGGTATTAGCAGCCGTTTCCAGCTGTTGTTCC 
CCTCCCAAGGGCAGGTTCTTACGCGTTACTCACCCGTCCGCCACTG 
GAAACACCACTTCCCGTCCGACTTGCATGTGTTAAGC 
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FIG. 28-3 

ATGCCGCCAGCGTTCATCCTGAGCCAGGATCGMCTCTCCATGAGAT 
TCATAGTTGCATTACTTATAGCTTCCTTGTTCGTAGACAAAGCGGAT 
TCGGAATTGTCTTTCATTCCAAGGCATAACTTGTATCCATGCGCTTC 
ATATTCGCCCGGAGTTCGCTCCCAGAAATATAGCCATCCCTGCCCCC 
TCACGTCAATCCCACGAGCCTCTTATCCATTCTCATTGAACGACGGC 
GGGGGAGCAAATCCAACTAGAAAAACTCACATTGGGCTTAGGGATAA 
TCAGGCTCGAACT6ATGACTTCCACCACGTCAAGGTGACACTCTACC 
GCTGAGTTATATCCCTTCCCCGCCCCATCGAGAAATAGAACTGACTA 
ATCCTAAGTCAAAGGCGTACGagaatactcaatCATGAATAAATGCA 
AGAAAATAACCTCTCCTTCTTTTTCTATAATGTAAACAAAAAAGTCT 
ATGTAAGTAAMTACTAGTAAATAAATAAAAAGAAAAAAAGAAAGGA 
GCAATAGCACCCTCTTGATAGAACAAGAAAATGATTATTGCTCCTTT 
CTTTTCAAAACCTCCTATAGACTAGGCCAGGATCCTCGAGcttaatt 
aaGGTAAAATCTTGGTTTATTTAATCATCAGGGACTCCCAAGCACAC 
TAGTTTTCTACAAATCAAAATAGAAAATAGAAAATGGAAGGCTTTTT 
ATTCAACAGTATAACATGACTTATATACTCGTGTCAACCAAGGTGTA 
TGTAGATCtaitcCTGCAGGATATCTGGATCCACGAAGCTTCCCATG 
GGAATAGATCTACATACACCTTGGTTGACACGAGTATATAAGTCATG 
TTATACTGTTGAATAAAAAGCCTTCCATTTTCTATTTTGATTTGTAG 
AAAACTAGTGTGCTTGGGAGTCCCTGATGATTAAATAAACCAAGATT 
TTACCGTTTAAACACCGGTGATCCTGGCCTAGTCTATAGGAGGTTTT 
GAAAAGAAAGGAGCAATMTCATTTTCTTGTTCTATCAAGAGGGTGC 
TATTGCTCCTTTCTTTTTTTCTTTTTATTTATTTACTAGTATTTTAC 
TTACATAGACTTTTTTGTTTACATTATAGAAAAAGAAGGAGAGGTTA 
TTTTCTTGCATTTATTCATGATTSAGTATTCTcctaggGTCGAGAAA 
CTCMCGCCACTATTCTTGAACAACTTGGAGCCGGGCCTTCTTTTCG 
CACTATTACGGATATGAAAATAATGGTCAAAATCGGATTCAATTGTC 
AACTGCCCCTATCGGAAATAGGATTGACTACCGATTCCGAAGGAACT 
GGAGTTACATCTCTTTTCCATTCAAGAGTTCTTATGCGTTTCCACGC 
CCCTTTGAGACCCCGAAAAATGGACAAATTCCTTTTCTTAGGAACAC 
ATACAAGATTCGTCACTACAAAAAGGATAATGGTAACCCTACCATTA 
ACTACTTCATTTATGAATTTCATAGTAATAGAAATACATGTCCTACC 
GAGACAGAATTTGGAACTTGCTATCCTCTTGCCTAGCAGGCAAAGAT 
TTACCTCCGTGGAAAGGATGATTCATTCGGATCGACATGAGAGTCCA 
ACTACATTGCCA6AATCCATGTTGTATATTTGAAAGAGGTTGACCTC 
CTTGCTTCTCTCATGGTACACTCCTCTTCCCGCCGAGCCCCTTTTCT 
CCTCGGTCCACAGAGACAAAATGTAGGACTGGTGCCAACAATTCATC 
AGACTCACTAAGTCGGGATCACTAACTAATACTAATCTAATATAATA 
GTCTAATATATCTAATATAATAGAAAATACTAATATAATAGAAAAGA 
ACTGTCTTTTCTGTATACTTTCCCCGGTTCCGTTGCTACCGCGGGCT 
TTACGCAATCGATCGGATTAGATAGATATCCCTTCAACATAGGTCAT 
CGA 
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FIG. 28-4 

MGGATCTCGGAGACCCACCAMGTACGAMGCCAGGATCTTTCAG 
AAAACGGATTCCTATTCAAAGAGTGCATAACCGCATGGATAAGCTC 
ACACTAACCCGTCAATTTGGGATCCAAATTCGAGATTTTCCTTGGG 
AGGTATCGGGAAGGATTTGGAATGGAATAATATCGATTCATACAGA 
AGAAAAGGTTCTCTATTGATTCAAACACTGTACCTAACCTATGGGA 
TAGGGATCGAGGMGGGGAAAAACCGAAGATTTCACATGGTACTTT 
TATCAATCTGATTTATTTCGTACCTTTCGTTCAATGAGAAAATGGG 
TCAAATTCTACAGGATCAAACCTATGGGACTTAAGGAATGATATAA 
AAAAAAGAGAGGGAAAATATTCATATTAAATAAATATGAAGTAGAA 
GAACCCAGATTCCAAATGAACAAATTCAAACTTGAAAAGGATCTTC 
CTTATTCTTGAAGAATGAGGGGCAAAGGGATTGATCAAGAAAGATC 
TTTTGTTCTTCTTATATATAAGATCGTGATGGTACCCTCTAGTCAA 
GGCCTTMGTGAGTCGTATTACGGACTGGCCGTCGTTTTACAACGT 
CGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAG 
CACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCAC 
CGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCGC 
TTCGCTTGGTAATAAAGCCCGCTTCGGCGGGCTTTTTTTT 
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SEQUENCE LISTING 



<110> XU, Ming-Qun 

EVANS, Thomas C. 
PRADHAN , Sriharsa 
COMB, Donald G. 
PAULUS, Henry 
SUN, Luo 
CHEN, Lixin 
GHOSH, Xnca 

NEW ENGLAND BIOLABS, INC. 

BOSTON BIOMEDICAL RESEARCH INSTITUTE 

<120> METHOD FOR GENERATING SPLIT, NON-TRANSFERABLE GENES 
THAT ARE ABLE TO EXPRESS AN ACTIVE PROTEIN PRODUCT 

<130> NEB-163-PCT 

<140> 
<141> 

<150> 60/135,677 
<151> 1999-05-24 

<160> 134 

<170> Patentln Ver. 2.0 

<210> 1 
<211> 19 
<212> DNA 

<213> Escherichia coli 
<400> 1 

ggacggggaa ctaactatg 19 

<210> 2 
<211> 20 
<212> DNA 

<213> Escherichia coli 



<400> 2 



ccacgatgac gcaccacgcg 



20 



<210> 3 
<211> 30 
<212> DNA 

<213> Escherichia coli 



1 



WO 00/71701 



PCT/US00/14122 



<400> 3 



ggagggggca natgaatggc gcacagtggg 



30 



<210> 4 
<211> 25 
<212> DNA 

<213> Escherichia coli 
<400> 4 

ggggggtcat gataatttct ccaac 25 

<210> 5 
<211> 28 
<212> DNA 

<213> Escherichia coli 



<210> 6 
<211> 28 
<212> DNA 

<213> Escherichia coli 
<400> 6 

cgtaaaccgg cataattacg ccacccgg 28 

<210> 7 
<211> 14 
<212> PRT 

<213> Synechocystis PCC6803 
<400> 7 

Leu Glu Lys Phe Ala Glu Tyr Cys Phe Asn Lys Ser Thr Gly 
15 10 



<210> 8 
<211> 21 
<212> PRT 

<213> Escherichia coli 
<400> 8 

Cys Ala Gin Trp Val Val His Ala Leu Arg Ala Gin Gly Val Asn Thr 
15 10 15 

Val Phe Gly Tyr Gly 



<400> 5 



ccgggtggcg taattatgcc ggtttacg 



28 



2 



WO 00/71701 



PCT/US00/14122 



20 



<210> 9 
<211> 20 
<212> PRT 

<213> Escherichia coli 
<400> 9 

Cys Val Trp Pro Leu Val Pro Pro Gly Ala Ser Asn Ser Glu Met Leu 
1 5 10 15 

Glu Lys Leu Ser 
20 



<210> 10 
<211> 26 
<212> DNA 

<213> Escherichia coli 
<400> 10 

gggggtcatg aatggcgcac agtggg 

<210> 11 
<211> 34 
<212> DNA 

<213> Escherichia coli 
<400> 11 

gcgcgctcga gttgatttaa cggctgctgt aatg 



<210> 12 
<211> 32 
<212> DNA 

<213> Escherichia coli 
<400> 12 

gcgcgaccgg ttgtgactgg cagcaacact gc 

<210> 13 
<211> 31 
<212> DNA 

<213> Escherichia coli 
<400> 13 

ggggggctgc agtcatgata atttctccaa c 



32 



31 



3 



WO 00/71701 



PCT/US00/14122 



<210> 14 
<211> 22 
<212> DNA 
<213> MAIZE 

<400> 14 

atcagtacac agtcctgcca tc 

<210> 15 

<211> 20 

<212> DNA 

<213> MAIZE 

<400> 15 

gagacagccg ccgcaaccat 

<210> 16 
<211> 29 
<212> DNA 
<213> MAIZE 

<400> 16 

gggcccatat ggccaccgcc gccgccgcg 

<210> 17 
<211> 29 
<212> DNA 
<213> MAIZE 

<400> 17 

gggccctcga ggcttccttc aagaagagc 



<210> 18 
<211> 29 
<212> DNA 
<213> MAIZE 

<400> 18 

gggccaccgg tacatcaaag aagagcttg 

<210> 19 
<211> 31 
<212> DNA 
<213> MAIZE 

<400> 19 

ggggctgcat tcagtacaca gtcctgccat c 



29 



31 



4 



WO 00/71701 



PCT/US00/14122 



<210> 20 
<211> 7 
<212> PRT 

<213> Synechocystis PCC6803 
<400> 20 

Leu Glu Lys Phe Ala Glu Tyr 
1 5 



<210> 21 
<211> 7 
<212> PRT 

<213> Synechocystis PCC6803 
<400> 21 

Cys Phe Asn Lys Ser Thr Gly 
1 5 



<210> 22 
<211> 21 
<212> PRT 
<213> MAIZE 

<400> 22 

Cys Lys Gly Ala Asp lie Leu Val Glu Ser Leu Glu Arg Cys Gly Val 
15 10 15 

Arg Asp Val Phe Ala 
20 



<210> 23 
<211> 21 
<212> PRT 
<213> MAIZE 

<400> 23 

Cys lie Pro Ser Gly Gly Ala Phe 
1 5 

Gly Arg Thr Val Tyr 
20 



Lys Asp Met He Leu Asp Gly Asp 
10 15 



<210> 24 
<211> 44 



5 



WO 00/71701 



PCT/US00/14122 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 24 

ggatcctaag aaggagatat acccatggaa tccctgacgt taca 

<210> 25 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 25 

gtcgacgctc tcctgcagtt aggcaggcgt actcattc 

<210> 26 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 26 

gctttgctcc tggcggcttt accttgtggt aaaaccgc 

<210> 27 
<21i> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 27 

gcggttttac cacaaggtaa agccgccagg agcaaagc 

<210> 28 
<211> 25 



6 



WO 00/71701 



PCTVUS00/14122 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimuriuin 

<400> 28 

gcccctaaag acacaattat tcgcg 25 

<210> 29 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 



<210> 30 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 30 

gcgaaccacc actaccaaca atttg 25 

<210> 31 
<211> 25 
<212> DNA 

<2I3> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; Synthetic, 
based on Salmonella typhimurium 



<400> 29 



cagcggcgcc gtcatcagca gagcg 



25 



<400> 31 



tatctccacg ccaaaggttt tcatt 



25 



<210> 32 
<211> 21 



7 



WO 00/71701 



<212> DNA 

<2I3> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
based on Salmonella typhimurium 

<400> 32 

gaatattgcc tgtcttttgg t 

<210> 33 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
based on Salmonella typhimurium 

<400> 33 

gttaaagcag ttagcagcga t 

<210> 34 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
based on Salmonella typhimurium 

<400> 34 

tgctgaatat tgcctgtctt ttgg 

<210> 35 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
based on Salmonella typhimurium 

<400> 35 

ccgttaaagc agttagcagc gatagc 

<210> 36 
<211> 44 



8 



WO 00/71701 



PCTVUSO0/14122 



<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 36 

ggatcctaag aaggagatat acccatggaa tccctgacgt taca 44 

<210> 37 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 37 

gatatcctgc agttaacctg gagagtgata ctgttgacc 39 

<210> 38 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 38 

gatatcccat gggacgctat ctggtcgagg gcgatg 36 

<210> 39 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic, 
based on Salmonella typhimurium 

<400> 39 

gtcgacgctc tcctgcagtt aggcaggcgt actcattc 3 8 

<210> 40 
<211> 31 



9 



WO 00/71701 



PCT/USOO/14122 



<212> DNA 

<213> Artificial Sequ&nce 
<220> 

<223> Description of Artificial Sequence: Synthetic from 
Synechocystis species PCC6803 

<400> 40 

tgctgaatat gcgctgtctt ttggtaccga a 31 

<210> 41 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; Synthetic from 
Synechocystis species PCC6803 

<400> 41 

ccgttaaacg ccgcagcagc gatagcgcc 29 

<210> 42 
<211> 178 
<212> PRT 

<213> Escherichia coli 
<400> 42 

Tyr Ala Val Asp Lys Ala Asp Leu Leu Leu Ala Leu Gly Val Arg Phe 
15 10 15 

Asp Asp Arg Val Thr Lys lie Glu Ala Phe Ala Ser Arg Ala Lys lie 
20 25 30 

Val His Val Asp He Asp Pro Ala Glu He Gly Lys Asn Lys Gin Pro 
35 40 45 

His Val Ser He Cys Ala Asp Val Lys Leu Ala Leu Gin Gly Met Asn 
50 55 60 

Ala Leu Leu Glu Gly Ser Thr Ser Lys Lys Ser Phe Asp Phe Gly Ser 
65 70 75 80 

Trp Asn Asp Glu Leu Asp Gin Gin Lys Arg Glu Phe Pro Leu Gly Tyr 
85 90 95 

Lys Thr Ser Asn Glu Glu He Gin Pro Gin Tyr Ala He Gin Val Leu 
100 105 110 



10 
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PCT/US00/14122 



Asp Glu Leu Thr Lys Gly Glu Ala lie lie Gly Thr Gly Val Gly Gin 
115 120 125 

Kis Gin Met Trp Ala Ala Gin Tyr Tyr Thr Tyr Lys Ar-g Pro Arg Gin 
130 135 140 

Trp Leu Ser Ser Ala Gly Leu Gly Ala Met Gly Phe Gly Leu Pro Ala 
145 150 155 160 

Ala Ala Gly Ala Ser Val Ala Asn Pro Gly Val Thr Val Val Asp lie 
165 170 175 



Asp Gly 



<210> 43 
<211> 179 
<212> PRT 

<213> Escherichia coii 
<400> 43 

Tyr Ala Val Asp Ser Ser Asp Leu Leu Leu Ala Phe Gly Val Arg Phe 
15 10 15 

Asp Asp Arg Val Thr Gly Lys Leu Glu Ala Phe Ala Ser Arg Ala Lys 
20 25 30 

lie Val His lie Asp lie Asp Ser Ala Glu lie Gly Lys Asn Lys Gin 
35 40 45 

Pro His Val Ser lie Cys Ala Asp lie Lys Leu Ala Leu Gin Gly Leu 
50 55 60 

Asn Ser lie Leu Glu Ser Lys Glu Gly Lys Leu Lys Leu Asp Phe Ser 
65 70 75 80 

Ala Trp Arg Gin Glu Leu Thr Glu Gin Lys Val Lys His Pro Leu Asn 
85 90 95 

Phe Lys Thr Phe Gly Asp Ala lie Pro Pro Gin Tyr Ala lie Gin Val 
100 105 110 

Leu Asp Glu Leu Thr Asn Gly Asn Ala lie lie Ser Thr Gly Val Gly 
115 120 125 

Gin His Gin Met Trp Ala Ala Gin Tyr Tyr Lys Tyr Arg Lys Pro Arg 



11 
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130 135 140 

Gin Trp Leu Thr Ser Gly Gly Leu Gly Ala Met Gly Phe Gly Leu Pro 
145 150 155 160 

Ala Ala He Gly Ala Ala Val Gly Arg Pro Asp Glu Val Val Val Asp 
165 170 175 

He Asp Gly 



<210> 44 
<211> 179 
<212> PRT 

<213> Escherichia coli 
<400> 44 

Tyr Ala Val Asp Ser Ser Asp Leu Leu Leu Ala Phe Gly Val Arg Phe 
15 10 15 

Asp Asp Arg Val Thr Gly Lys Leu Glu Ala Phe Ala Ser Arg Ala Lys 
20 25 30 

He Val His He Asp He Asp Ser Ala Glu He Gly Lys Asn Lys Gin 
35 40 45 

Pro His Val Ser He Cys Ala Asp He Lys Leu Ala Leu Gin Gly Leu 
50 55 60 

Asn Ser He Leu Glu Ser Lys Glu Gly Lys Leu Lys Leu Asp Phe Ser 
65 70 75 80 

Ala Trp Arg Gin Glu Leu Thr Val Gin Lys Val Lys Tyr Pro Leu Asn 
85 90 95 

Phe Lys Thr Phe Gly Asp Ala He Pro Pro Gin Tyr Ala lie Gin Val 
100 105 110 

Leu Asp Glu Leu Thr Asn Gly Ser Ala lie lie Ser Thr Gly Val Gly 
115 120 125 

Gin His Gin Met Trp Ala Ala Gin Tyr Tyr Lys Tyr Arg Lys Pro Arg 
130 135 140 

Gin Trp Leu Thr Ser Gly Gly Leu Gly Ala Met Gly Phe Gly Leu Pro 
145 150 155 160 



12 
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Ala Ala lie Gly Ala Ala Val Gly Arg Pro Asp Glu Val Val Val Asp 
165 170 175 



lie Asp Gly 



<21D> 45 
<21i> 180 
<212> PRT 

<213> Escherichia coli 
<400> 45 

Met Thr Met His Asn Ala Asp Val He Phe Ala Val Gly Val Arg Phe 
15 10 15 

Asp Asp Arg Thr Thr Asn Asn Leu Ala Lys Tyr Cys Pro Asn Ala Thr 
20 25 30 

Val Leu His He Asp He Asp Pro Thr Ser He Ser Lys Thr Val Thr 
35 40 45 

Ala Asp He Pro He Val Gly Asp Ala Arg Gin Val Leu Glu Gin Met 
50 55 60 

Leu Glu Leu Leu Ser Gin Glu Ser Ala His Gin Pro Leu Asp Glu He 
65 70 75 80 

Arg Asp Trp Trp Gin Gin He Glu Gin Trp Arg Ala Arg Gin Cys Leu 
85 90 95 

Lys Tyr Asp Thr His Ser Glu Lys He Lys Pro Gin Ala Val He Glu 
100 105 110 

Thr Leu Trp Arg Leu Thr Lys Gly Asp Ala Tyr Val Thr Ser Asp Val 
115 120 125 

Gly Gin His Gin Met Phe Ala Ala Leu Tyr Tyr Pro Phe Asp Lys Pro 
130 135 140 

Arg Arg Trp He Asn Ser Gly Gly Leu Gly Thr Met Gly Phe Gly Leu 
145 150 155 160 

Pro Ala Ala Leu Gly Val Lys Met Ala Leu Pro Glu Glu Thr Val Val 
165 170 175 

Cys Val Thr Gly 
180 



13 



WO 00/71701 
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<210> 46 
<211> 170 
<212> PRT 

<213> Escherichia coli 
<400> 46 

Phe Ala Val Gin Glu Cys Asp Leu Leu lie Ala Val Gly Ala Arg Phe 
15 10 15 

Asp Asp Arg Val Thr Gly Lys Leu Asn Thr Ser Ala Pro His Ala Ser 
20 25 30 

Val lie His Met Asp lie Asp Pro Ala Glu Met Asn Lys Leu Arg Gin 
35 40 45 

Ala His Val Ala Leu Gin Gly Asp Leu Asn Ala Leu Leu Pro Ala Leu 
50 55 60 

Gin Gin Pro Leu Asn Gin Cys Asp Trp Gin Gin His Cys Ala Gin Leu 
65 70 75 80 

Arg Asp Glu His Ser Trp Arg Tyr Asp His Pro Gly Asp Ala He Tyr 
85 90 95 

Ala Pro Leu Leu Leu Lys Gin Leu Ser Asp Arg Lys Pro Ala Asp Cys 
100 105 110 

Val Val Thr Thr Asp Val Gly Gin His Gin Met Trp Ala Ala Gin His 
115 120 125 

He Ala His Thr Arg Pro Glu Asn Phe He Thr Ser Ser Gly Leu Gly 
130 135 140 

Thr Met Gly Phe Gly Leu Pro Ala Ala Val Gly Ala Gin Val Ala Arg 
145 150 155 160 

Pro Asn Asp Thr Val Val Cys He Ser Gly 
165 170 



<210> 47 
<211> 35 
<212> DNA 

<213> Escherichia coli 
<400> 47 



14 
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gccttaatta accatgaggg aagcggtgat cgccg 



35 



<210> 48 
<211> 34 
<212> DNA 

<213> Escherichia coli 
<400> 48 

tgcggtcgac tttgccgact accttggtga tctc 34 

<210> 49 
<211> 41 
<212> DNA 

<213> Escherichia coli 



<210> 50 
<211> 36 
<212> DNA 

<213> Escherichia coli 
<400> 50 

gcgaccggtt tatttgtata gttcatccat gccatg 36 

<210> 51 
<211> 39 
<212> DNA 

<213> Escherichia coli 



<210> 52 
<211> 38 
<212> DNA 

<213> Escherichia coli 
<400> 52 

ggcctcgagt tatttaattg tcccagcgtc aagtaatg 38 

<210> 53 
<211> 41 
<212> DNA 

<213> Escherichia coli 



<400> 49 

cccaagcttg gcgccatgag taaaggagaa gaacttttca c 



41 



<400> 51 



agggaattcg tcgacaaatt tgctgaatat tgcctgtct 



39 



<400> 53 



15 
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agctttgttt aaaccatggt taaagttatc ggtcgtagat c 

<210> 54 
<211> 43 
<212> DNA 

<213> Escherichia coli 
<400> 54 

cagcgtcgac ggcgccgtgg gatttgttaa agcagttagc age 

<210> 55 
<211> 31 
<212> DNA 

<213> Escherichia coli 
<400> 55 

catgecatgg gggaagcggt gatcgecgaa g 

<210> 56 
<211> 39 
<212> DNA 

<213> Escherichia coli 
<400> 56 

acgcgagctc ttatttaatt gtcccagcgt caagtaatg 

<210> 57 
<211> 34 
<212> DNA 

<213> Escherichia coli 
<400> 57 

cgaattctat ggttaaagtt ateggtegta gate 

<210> 58 
<211> 36 
<212> DNA 

<213> Escherichia coli 
<400> 58 

agcccgcggt tatttgtata gttcatccat gecatg 

<210> 59 
<211> 154 
<212> DNA 

<213> Nicotiana tabacum 
<400> 59 



16 



WO 00/71701 
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gaatagatct acatacacct tggttgacac 
aacaagcctt ccattttcta ttttgatttg 
tgattaaata aaccaagatt ttaccttaat 



gagtatataa gtcatgttat actgttgaat 60 
tagaaaacta gtgtgcttgg gagtccctga 120 
taag 154 



<210> 60 
<211> 151 
<212> DNA 

<213> Nicotiana tabacum 



<400> 60 

gatcctggcc tagtctatag gaggttttga 
ctatcaagag ggtgctattg ctcctttctt 
acttacatag acttttttgt ttacgtattc 



aaagaaagga gcaataatca ttttcttgtt 60 
tttttctttt tatttattta ctagtatttt 120 
t 151 



<210> 61 
<211> 185 
<212> DNA 

<213> Nicotiana tabacum 



<400> 61 

catatggcgt ccatgatctc ctcgtccgcg 
cagtcggccg cggtggcccc gttcggcggc 
aaggtcaaca cggacatcac gtccatcacg 
agagc 



gtgaccacgg tcagccgcgc gtccacggtg 60 
ctcaagtcca tgaccggctt cccggtcaag 120 
agcaacggcg gcagggtgaa gtgcatgcga 180 

185 



<210> 62 
<211> 6232 
<212> DNA 
<213> Unknown 



<220> 

<223> nucleotides 1-2492: E. coli vector pLITMUS28 (New 
England Biolabs , Xnc . } 



<220> 

<223> nucleotides 2493-5993 : Nicotiana tabaceuin 
<220> 

<223> Nucleotides 5993-6232: E.coli vector pLITMUS28 
(New England Biolabs, Inc.) 



<400> 62 

gttaactacg tcaggtggca cttttcgggg 
tttctaaata cattcaaata tgtatccgct 
ataatattga aaaaggaaga gtatgagtat 
ttttgcggca ttttgccttc ctgtttttgc 
tgctgaagat cagttgggtg cacgagtggg 
gatccttgag agttttcgcc ccgaagaacg 



aaatgtgcgc ggaaccccta tttgtttatt 60 
catgagacaa taaccctgat aaatgcttca 120 
tcaacatttc cgtgtcgccc ttattccctt 180 
tcacccagaa acgctggtga aagtaaaaga 240 
ttacatcgaa ctggatctca acagcggtaa 3 00 
ttctccaatg atgagcactt ttaaagttct 3 60 
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gctatgtggc 
acactattct 
tggcatgaca 
caacttactc 

gggggatcat 
cgacgagcgt 
tggcgaacta 
agttgcagga 
tggagccggt 
ctcccgtatc 
acagatcgct 
ctcatatata 
aagattgtat 
aatttttgtt 
aaatcaaaag 
ctattaaaga 
ccactacgtg 
aatcggaacc 
gaaaggaagg 
cgctgcgcgt 
atctaggtga 
ttccactgag 
ctigcgcgtaa 
ccggatcaag ■ 
ccaaatactg 
ccgcctacat 
tcgtgtctta 1 
tgaacggggg 
tacctacagc i 
tatccggtaa < 
gcctggtatc 
tgatgctcgt i 
ttcctggcct 
accccaggct 
acaatttcac i 
ctagtgggca < 
tatcagtcga < 
atcgttccat < 
cgtgttgctt ' 
ttttttaaga < 
atctgatcca < 
ctttattttt ' 
attttgacta * 
ccacggtttrt < 
cagattcaat < 
ttttttcgaa 1 
ttttttatca < 
actgttggta : 



gcggtattat 
cagaatgact 
gtaagagaat 
ctgacaacga 
gtaactcgcc 
gacaccacga 
cttactctag 
ccacttctgc 
gagcgtgggt 
gtagttatct 
gagataggtg 
ctttagattg 
aagcaaatat 
aaatcagctc 
aatagcccga 
acgtggactc 
aaccatcacc 
ctaaagggag 
gaagaaagcg 
aaccaccaca 
agatcctttt 
cgtcagaccc 
tctgctgctt 
agctaccaac 
ttcttctagt 
acctcgctct 
ccgggttgga 
gttcgtgcac 
gtgagctatg 
gcggcagggt 
tttatagtcc 
caggggggcg 
tttgctggcc 
ttacacttta 
acaggaaaca 
gatcttcgaa 
agcaggagac 
ctcaattgaa 
ttgtattgtt 
gcgttttgat 
cgctcttttt 
ttatttgatc 
acattttcat 
attttatata 
atgggacgat 
tttttttgat 
attttatcaa 
tcgatcttaa 



cccgtgttga 
tggttgagta 
tatgcagtgc 
tcggaggacc 
ttgatcgttg 
tgcctgtagc 
cttcccggca 
gctcggccct 
ctcgcggtat 
acacgacggg 
cctcactgat 
atttaccccg 
ttaaattgta 
attttttaac 
gatagggttg 
caacgtcaaa 
caaatcaagt 
cccccgattt 
aaaggagcgg 
cccgccgcgc 
tgataatctc 
cgtagaaaag 
gcaaacaaaa 
tctttttccg 
gtagccgtag 
gctaatcctg 
ctcaagacga 
acagcccagc 
agaaagcgcc 
cggaacagga 
tgtcgggttt 
gagcctatgg 
ttttgctcac 
tgcttccggc 
gctatgacca 
tgcatcgcgc 
aatatacctt 
aaagcaaata 
ttttcttttt 
gttttgagag 
ctccttgact 
gtagaaaaaa 
ttgtattcaa 
cattataaag 
ttaatatttt 
tgttttctgg 
ttatttgata 
cccaggcctc 



cgccgggcaa 
ctcaccagtc 
tgccataacc 
gaaggagcta 
ggaaccggag 
aatggcaaca 
acaattaata 
tccggctggc 
cattgcagca 
gagtcaggca 
taagcattgg 
gttgataatc 
aacgttaata 
caataggccg 
agtgttgttc 
gggcgaaaaa 
tttttggggt 
agagcttgac 
gcgctagggc 
ttaatgcgcc 
atgaccaaaa 
atcaaaggat 
aaaccaccgc 
aaggtaactg 
ttaggccacc 
ttaccagtgg 
tagttaccgg 
ttggagcgaa 
acgcttcccg 
gagcgcacga 
cgccacctct 
aaaaacgcca 
atgtaatgtg 
tcgtatgttg 
tgattacgcc 
gcttgacgat 
gatattctcg 
acgtttcaag 
acccttcttt 
aacagggccc 
tgcgggttct 
gttttgtttt 
atttaaaaga 
tggtacaaat 
ttcattcatt 
attttgatga 
attattaata 
aatatcttct 



gagcaactcg 
acagaaaagc 
atgagtgata 
accgcttttt 
ctgaatgaag 
acgttgcgca 
gactggatgg 
tggtttattg 
ctggggccag 
actatggatg 
taactgtcag 
agaaaagccc 
ttttgttaaa 
aaatcggcaa 
cagtttggaa 
ccgtctatca 
cgaggtgccg 
ggggaaagcg 
gctggcaagt 
gctacagggc 
tcccttaacg 
cttcttgaga 
taccagcggt 
gcttcagcag 
acttcaagaa 
ctgctgccag 
ataaggcgca 
cgacctacac 
aagggagaaa 
gggagcttcc 
gacttgagcg 
gcaacgcggc 
agttagctca 
tgtggaattg 
aagctacgta 
atagcaattt 
atcattcttt 
aacaaatcta 
gtgtctgatt 
agatttcctt 
tttgcttctt 
tggtttttat 
agtaatttgc 
tctgggaaga 
cccatccaat 
atcgtaagat 
ccaattttag 
ttttgtctaa 



PCT/US00/14122 

gtcgccgcat 420 
atcttacgga 480 
acactgcggc 540 
tgcacaacat 600 
ccataccaaa 660 
aactattaac 720 
aggcggataa 780 
ctgataaatc 840 
atggtaagcc 900 
aacgaaatag 960 
accaagttta 1020 
caaaaacagg 1080 
attcgcgtta 1140 
aatcccttat 1200 
caagagtcca 1260 
gggcgatggc 132 0 
taaagcacta 1380 
aacgtggcga 1440 
gtagcggtca 1500 
gcgtaaaagg 1560 
tgagttttcg 1620 
tccttttttt 1680 
ggtttgtttg 1740 
agcgcagata 1800 
ctctgtagca 1860 
tggcgataag 1920 
gcggtcgggc 1980 
cgaactgaga 2040 
ggcggacagg 2100 
agggggaaac 2160 
tcgatttttg 2220 
ctttttacgg 2280 
ctcattaggc 2340 
tgagcggata 2400 
atacgactca 2460 
tgcttggatt 2520 
gattcaaagc 2580 
gttctgcttc 2640 
ccgcgtaatc 2700 
tgttttctat 2760 
gaattcgatt 2820 
tgatgttttt 2880 
ttggtataat 2940 
accaaaattc 3000 
caaaaaaggc 3060 
aaaaaaagcc 3120 
tatttggatt 3180 
gagaaaaatg 3240 
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gataattttc caatcaaaat atrttctatc 
ttttcttaga taattattga tatgaagatt 
cgtgttggaa ttagaagaaa tttcgaggtt 
aaaagagtca tttttttttt cataattaat 
acatttttga aaattatctt tttggtttgc 
tttgtaatga attaattggt ctttttcata 
agccatacaa ccttgattaa ccctatttcg 
aatctgagat aaatcgtacg agaatactca 
ccttcttttt ctataatgta aacaaaaaag 
aaaaagaaaa aaagaaagga gcaatagcac 
ctcctttctt ttcaaaacct cctatagact 
aaatcttggt ttatttaatc atcagggact 
atagaaaata gaaaatggaa ggctttttat 
gtcaaccaag gtgtatgtag atctattcct 
tgggaataga tctacataca ccttggttga 
aataaaaagc cttccatttt ctattttgat 
tgatgattaa ataaaccaag attttaccgt 
aggaggtttt gaaaagaaag gagcaataat 
tgctcctttc tttttttctt tttatttatt 
gtttacatta tagaaaaaga aggagaggtt 
tctcctaggc gtattgataa tgccgtctta 
tctgaagttt cttatgtttt aattcagaat 
ttattttagt cttaaggaaa aaagacgttc 
acaaattaat aacttggggt tgtgataatt 
ataaatcaaa aaaaatatgt gaatttttct 
tcgaaataaa gtgaattttt ttttgattat 
aaatgtattt atcaatcaat ttgtttgttg 
gaatattaat gatagataaa aatagatcga 
aataatggaa tttccatatt aatcgagtat 
ttggcgattc gaatttttta atattatttg 
ttactttctt tttctctttt gtaattcttt 
cagtcaaatc cttcattttg ctttctatca 
tttgactaaa tgattcgtta attatctgat 
tttcattcga ttcatctatt tctttgagtc 
gttctttttt catttttttt ataaatagac 
ttgaaattct tcgaaataat tttatttttc 
tctttttgaa ttttccaatt tttttttcga 
ggcgttttcg gggagaacca aagggaagtt 
aaaaatcatc tttttgtttt ttctttttca 
atatatgcca aggtttcaga caaaaaggaa 
accaattttt tggaaattct gtttctgata 
gcatttctct attccattcc tgcaaatctt 
tacgcccgag atttttggct attatcaatg 
attgagttat taacatgtaa cctcttatta 
ctgctatctc tatccgtgct ttttcctttc 
ttttctcttc tctttttgtt tgttcttctc 
ctagtcaagg ccttaagtga gtcgtattac 
tgggaaaacc ctggcgttac ccaacttaat 



gagatttctt tctatatata gaatattgcc 3300 
gccgagcata tcaaaaaggt tgtgtttgga 3360 
cttatttact tgaaagggta atctagaaat 3420 
cgatttatat gctaaaagat catatctata 3480 
taatgaatag agctcagaat cattttcttt 3540 
tgaattccat ttgtttaaat ttcgattttg 3600 
ccatttttgt ggcattaatc tagaccatct 3660 
atcatgaata aatgcaagaa aataacctct 3720 
tctatgtaag taaaatacta gtaaataaat 3780 
cctcttgata gaacaagaaa atgattattg 3840 
aggccaggat cctcgagctt aattaaggta 3900 
cccaagcaca ctagttttct acaaatcaaa 3960 
tcaacagtat aacatgactt atatactcgt 4020 
gcaggatatc tggatccacg aagcttccca 4080 
cacgagtata taagtcatgt tatactgttg 4140 
ttgtagaaaa ctagtgtgct tgggagtccc 4200 
ttaaacaccg gtgatcctgg cctagtctat 4260 
cattttcttg ttctatcaag agggtgctat 4320 
tactagtatt ttacttacat agactttttt 4380 
attttcttgc atttattcat gattgagtat 4440 
accagttttt ccattgattg attctataac 4500 
gaaatattcc tagtgttcga aaatagtcct 4560 
tgttatattg aagaacagat cttaatttag 4620 
tgtaaaatac atatgcttgt gataagtagg 4680 
tactaatatt ataaagtgac ttttttatag 4740 
taattttttc ttgatttatt tcattattgg 4800 
attcaagaaa gagttgtgta ttaattctgg 4860 
tgtataatct ttgaatgaat aattttagaa 4920 
ttcttctttt taatzatttgg aaaatctttt 4980 
ttttattagg actaatgtct atttctggag 5040 
ctatttgatt tttgattgta cttgttctat 5100 
gtgaagaatt tggccaattt ccagattcaa 5160 
tactcattag agaatctttt tcttttttcg 5220 
taaataatac aattggattt acttttgaaa 5280 
tacttttgat aagccatttt ttggtttctt 5340 
ctttgaaaac ttttagagtt ataaaatatt 5400 
gttccttaaa aatgggctca aaaaaagaag 5460 
cagcttccat tccccaaact gttaaaaaac 5520 
ttagctctcc acgggaggag tacagtttag 5580 
ataatatttt gatctgaatg ccatctttca 5640 
attgaacacc attataagta catttaatat 5700 
cagaccattc aggaagttgc aagactaaca 5760 
aaggtaatac aatatatttt cgaagaattg 5820 
tttgcgcaaa aggaatggta tcccaggctt 5880 
ttttgttctc cccttttttg tccttttcct 5940 
tagactctag aatcttgaat tcggtaccct 6000 
ggactggccg tcgttttaca acgtcgtgac 6060 
cgccttgcag cacatccccc tttcgccagc 6120 
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tggcgtaata gcgaagaggc ccgcaccgat cgcccttccc aacagttgcg cagcctgaat 6180 
ggcgaatggc gcttcgcttg gtaataaagc ccgcttcggc gggctttttt tt 6232 

<210> 63 
<211> 6477 
<212> DNA 
<213> Unknown 

<220> 

<223> Nucleotides 1-2482: E. coli vector pLITMUS28 (New 
England Biolabs , Inc . } 

<220> 

<223> Nucleotides 2493-6242: Nicotiana tabaceum 
<220> 

<223> Nucleotides 6243-6477: E. coli vector pLITMUS28 
(New England Biolabs, Inc.) 

<400> 63 

gttaactacg tcaggtggca cttttcgggg aaatgtgcgc ggaaccccta tttgtttatt 60 
tttctaaata cattcaaata tgtatccgct catgagacaa taaccctgat aaatgcttca 120 
ataatattga aaaaggaaga gtatgagtat tcaacatttc cgtgtcgccc ttattccctt 180 
ttttgcggca ttttgccttc ctgtttttgc tcacccagaa acgctggtga aagtaaaaga 240 
tgctgaagat cagttgggtg cacgagtggg ttacatcgaa ctggatctca acagcggtaa 300 
gatccttgag agttttcgcc ccgaagaacg ttctccaatg atgagcactt ttaaagttct 360 
gctatgtggc gcggtattat cccgtgttga cgccgggcaa gagcaactcg gtcgccgcat 420 
acactattct cagaatgact tggttgagta ctcaccagtc acagaaaagc atcttacgga 480 
tggcatgaca gtaagagaat tatgcagtgc tgccataacc atgagtgata acactgcggc 540 
caacttactt ctgacaacga tcggaggacc gaaggagcta accgcttttt tgcacaacat 600 
gggggatcat gtaactcgcc ttgatcgttg ggaaccggag ctgaatgaag ccataccaaa 660 
cgacgagcgt gacaccacga tgcctgtagc aatggcaaca acgttgcgca aactattaac 720 
tggcgaacta cttactctag cttcccggca acaattaata gactggatgg aggcggataa 780 
agttgcagga ccacttctgc gctcggccct tccggctggc tggtttattg ctgataaatc 840 
tggagccggt gagcgtgggt ctcgcggtat cattgcagca ctggggccag atggtaagcc 900 
ctcccgtatc gtagttatct acacgacggg gagtcaggca actatggatg aacgaaatag 960 
acagatcgct gagataggtg cctcactgat taagcattgg taactgtcag accaagttta 1020 
ctcatatata ctttagattg atttaccccg gttgataatc agaaaagccc caaaaacagg 1080 
aagattgtat aagcaaatat ttaaattgta aacgttaata ttttgttaaa attcgcgtta 1140 
aatttttgtt aaatcagctc attttttaac caataggccg aaatcggcaa aatcccttat 1200 
aaatcaaaag aatagcccga gatagggttg agtgttgttc cagtttggaa caagagtcca 1260 
ctattaaaga acgtggactc caacgtcaaa gggcgaaaaa ccgtctatca gggcgatggc 1320 
ccactacgtg aaccatcacc caaatcaagt tttttggggt cgaggtgccg taaagcacta 1380 
aatcggaacc ctaaagggag cccccgattt agagcttgac ggggaaagcg aacgtggcga 1440 
gaaaggaagg gaagaaagcg aaaggagcgg gcgctagggc gctggcaagt gtagcggtca 1500 
cgctgcgcgt aaccaccaca cccgccgcgc ttaatgcgcc gctacagggc gcgtaaaagg 1560 
atctaggtga agatcctttt tgataatctc atgaccaaaa tcccttaacg tgagttttcg 1620 
ttccactgag cgtcagaccc cgtagaaaag atcaaaggat cttcttgaga tccttttttt 1680 
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ctgcgcgtaa tctgctgctt gcaaacaaaa aaaccaccgc taccagcggt ggtttgtttg 1740 

ccggatcaag agctaccaac tctttttccg aaggtaactg gcttcagcag agcgcagata 1800 

ccaaatactg ttcttctagt gtagccgtag ttaggccacc acttcaagaa ctctgtagca 1860 

ccgcctacat acctcgctct gctaatcctg ttaccagtgg ctgctgccag tggcgataag 1920 

tcgtgtctta ccgggttgga ctcaagacga tagttaccgg ataaggcgca gcggtcgggc 1980 

tgaacggggg gttcgtgcac acagcccagc ttggagcgaa cgacctacac cgaactgaga 2040 

tacctacagc gtgagctatg agaaagcgcc acgcttcccg aagggagaaa ggcggacagg 2100 

tatccggtaa gcggcagggt cggaacagga gagcgcacga gggagcttcc agggggaaac 2160 

gcctggtatc tttatagtcc tgtcgggttt cgccacctct gacttgagcg tcgatttttg 2220 

tgatgctcgt caggggggcg gagcctatgg aaaaacgcca gcaacgcggc ctttttacgg 2280 

ttcctggcct tttgctggcc ttttgctcac atgtaatgtg agttagctca ctcattaggc 2340 

accccaggct ttacacttta tgcttccggc tcgtatgttg tgtggaattg tgagcggata 2400 

acaatttzcac acaggaaaca gctatgacca tgattacgcc aagctacgta atacgactca 2460 

ctagtgggca gatcttcgaa tgcatcgcgc gcaattcacc gccgtatggc tgaccggcga 2520 

ttactagcga ttccggcttc atgcaggcga gttgcagcct gcaatccgaa ctgaggacgg 2580 

gtttttgggg ttagctcacc ctcgcgggat cgcgaccctt tgtcccggcc attgtagcac 2640 

gtgtgtcgcc cagggcataa ggggcatgat gacttgacgt catcctcacc ttcctccggc 2700 

ttatcaccgg cagtctgttc agggttccaa actcaacgat ggcaactaaa cacgagggtt 2760 

gcgctcgttg cgggacttaa cccaacacct tacggcacga gctgacgaca gccatgcacc 2820 

acctgtgtcc gcgttcccga aggcacccct ctctttcaag aggattcgcg gcatgtcaag 2880 

ccctggtaag gttcttcgct ttgcatcgaa ttaaaccaca tgctccaccg cttgtgcggg 2940 

cccccgtcaa ttcctttgag tttcatnctt gcgaacgtac tccccaggcg ggatacttaa 3 000 

cgcgttagct acagcactgc acgggtcgat acgcacagcg cctagtatcc atcgtttacg 3060 

gctaggacta ctggggtatc taatcccatt cgctccccta gctttcgtct ctcagtgtca 3120 

gtgtcggccc agcagagtgc tttcgccgtt ggtgttcttt ccgatctcta cgcattXcac 3180 

cgctccaccg gaaattccct ctgcccctac cgtactccag cttggtagtt tccaccgcct 3240 

gtccagggtt gagccctggg atttgacggc ggacttaaaa agccacctac agacgcttta 3300 

cgcccaatca ttccggataa cgcttgcatc ctctgtatta ccgcggctgc tggcacagag 3360 

ttagccgatg cttattcccc agataccgtc attgcttctt ctccgggaaa agaagttcac 3420 

gacccgtggg ccttctacct ccacgcggca ttgctccgtc agctttcgcc cattgcggaa 3480 

aattccccac tgctgcctcc cgtaggagtc tgggccgtgt ctcagtccca gtgtggctga 3540 

tcatcctctc ggaccagcta ctgatcatcg ccttggtaag ctattgcctc accaactagc 3600 

taatcagacg cgagcccctc ctcgggcgga ttcctccttt tgctcctcag cctacggggt 3660 

attagcagcc gtttccagct gttgttcccc tcccaagggc aggttcttac gcgttactca 3720 

cccgtccgcc actggaaaca ccacttcccg tccgacttgc atgtgttaag catgccgcca 3780 

gcgttcatcc tgagccagga tcgaactctc catgagattc atagttgcat tacttatagc 3840 

ttccttgttc gtagacaaag cggattcgga attgtctttc attccaaggc atzaacttgta 3900 

tccatgcgct tcatattcgc ccggagttcg ctcccagaaa tatagccatc cctgccccct 3960 

cacgtcaatc ccacgagcct cttatccatt ctcattgaac gacggcgggg gagcaaatcc 4020 

aactagaaaa actcacattg ggcttaggga taatcaggct cgaactgatg acttccacca 4080 

cgtcaaggtg acactctacc gctgagttat atcccttccc cgccccatcg agaaatagaa 4140 

ctgactaatc ctaagtcaaa ggcgtacgag aatactcaat catgaataaa tgcaagaaaa 4200 

taacctctcc ttctttttct ataatgtaaa caaaaaagtc tatgtaagta aaatactagt 4260 

aaataaataa aaagaaaaaa agaaaggagc aatagcaccc tcttgataga acaagaaaat 4320 

gattattgct cctttctttt caaaacctcc tatagactag gccaggatcc tcgagcttaa 4380 

ttaaggtaaa atcttggttt atttaatcat cagggactcc caagcacact agttttctac 4440 

aaaccaaaat agaaaataga aaatggaagg ctttttattc aacagtataa catgacttat 4500 

atactcgtgt caaccaaggt gtatgtagac ctattcctgc aggatatctg gatccacgaa 4560 
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gcttcccatg ggaatagatc tacatacacc ttggttgaca cgagtatata agtcatgtta 4620 
tactgttgaa taaaaagcct tccattttct attttgattt gtagaaaact agtgtgcttg 4680 
ggagtccctg atgattaaat aaaccaagat tttaccgttt aaacaccggt gatcctggcc 4740 
tagtctatag gaggttttga aaagaaagga gcaataatca ttttcttgtt ctatcaagag 4800 
ggtgctattg ctcctttctt tttttctttt tatttattta ctagtatttt acttacatag 4860 
acttttttgt ttacattata gaaaaagaag gagaggttat tttcttgcat ttattcatga 4920 
ttgagtattc tcctagggtc gagaaactca acgccactat tcttgaacaa cttggagccg 4980 
ggccttcttt tcgcactatt acggatatga aaataatggt caaaatcgga ttcaattgtc 5040 
aactgcccct atcggaaata ggattgacta ccgattccga aggaactgga gttacatctc 5100 
ttttccattc aagagttctt atgcgtttcc acgccccttt gagaccccga aaaatggaca 5160 
aattcctttt cttaggaaca catacaagat tcgtcactac aaaaaggata atggtaaccc 5220 
taccattaac tacttcattt atgaatttca tagtaataga aatacatgtc ctaccgagac 5280 
agaatttgga acttgctatc ctcttgccta gcaggcaaag atttacctcc gtggaaagga 5340 
tgattcattc ggatcgacat gagagtccaa ctacattgcc agaatccatg ttgtatattt 5400 
gaaagaggtt gacctccttg cttctctcat ggtacactcc tcttcccgcc gagccccttt 5460 
tctcctcggt ccacagagac aaaatgtagg actggtgcca acaattcatc agactcacta 5520 
agtcgggatc actaactaat actaatctaa tataatagtc taatatatct aatataatag 5580 
aaaatactaa tataatagaa aagaactgtc ttttctgtat actttccccg gttccgttgc 5640 
taccgcgggc tttacgcaat cgatcggatt agatagatat cccttcaaca taggtcatcg 5700 
aaaggatctc ggagacccac caaagtacga aagccaggat ctttcagaaa acggattcct 57 60 
attcaaagag tgcataaccg catggataag ctcacactaa cccgtcaatt tgggatccaa 5820 
attcgagatt ttccttggga ggtatcggga aggatttgga atggaataat atcgattcat 5880 
acagaagaaa aggttctcta ttgattcaaa cactgtacct aacctatggg atagggatcg 5940 
aggaagggga aaaaccgaag atttcacatg gtacttttat caatctgatt tatttcgtac 6000 
ctttcgttca atgagaaaat gggtcaaatt ctacaggatc aaacctatgg gacttaagga 6060 
atgatataaa aaaaagagag ggaaaatatt catattaaat aaatatgaag tagaagaacc 6120 
cagattccaa atgaacaaat tcaaacttga aaaggatctt ccttattctt gaagaatgag 6180 
gggcaaaggg attgatcaag aaagatcttt tgttcttctt atatataaga tcgtgatggt 6240 
accctctagt caaggcctta agtgagtcgt attacggact ggccgtcgtt ttacaacgtc 6300 
gtgactggga aaaccctggc gttacccaac ttaatcgcct tgcagcacat ccccctttcg 6360 
ccagctggcg taatagcgaa gaggcccgca ccgatcgccc ttcccaacag ttgcgcagcc 6420 
tgaatggcga atggcgcttc gcttggtaat aaagcccgct tcggcgggct ttttttt 6477 



<210> 64 
<211> 31 
<212> DNA 

<213> Nicotiana tabacum 
<400> 64 

aactgcagga atagatctac atacaccttg g 

<210> 65 
<211> 42 
<212> DNA 

<213> Nicotiana tabacum 



31 



<400> 65 

ccgctcgagc ttaattaagg taaaatcttg gtttatttaa tc 42 
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<2I0> 66 
<21i> 33 
<212> DNA 

<213> Nicotiana tabacum 
<400> 66 

gcgaccggtg atcctggcct agtctatagg agg 3 3 

<210> 67 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 67 

aggcctagga gaatactcaa tcatgaataa atgc 34 

<210> 68 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 68 

ttggcgcgct tgacgatata gcaattttgc ttgg 34 

<210> 69 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 69 

ttgcgtacga tttatctcag attagatggt ctag 34 

<210> 70 
<211> 35 
<212> DNA 

<213> Nicotiana tabacum 
<400> 70 

ttgcctaggc gtattgataa tgccgtctta accag 3 5 

<210> 71 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 71 

aggggtaccg aattcaagat tctagagtct agag 34 
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<210> 72 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 72 

ttggcgcgca attcaccgcc gtatggctga ccgg 34 

<210> 73 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 73 

ttgcgtacgc ctttgactta ggattagtca gttc 34 

<210> 74 
<211> 34 
<212> DNA 

<213> Nicotiana tabacum 
<400> 74 

ttgcctaggg tcgagaaact caacgccact attc 34 

<210> 75 
<211> 35 
<212> DNA 

<213> Nicotiana tabacum 
<400> 75 

aggggtacca tcacgatctt atatataaga agaac 35 

<210> 76 
<211> 250 
<212> DNA 

<213> Nicotiana tabacum 
<400> 76 

gaattgtgag cgctcacaat tctaggatgt taattgcgcc gacatcataa cggttctggc 60 
aaatattctg aaatgagctg ttgacaatta atcatcggct cgtataatgt gtggaattgt 120 
gagcggataa caatttcaca caggaaacag accatggtga attctagagc tcgaggatcc 180 
gcggtacccg ggcatgcatt cgaagcttcc ttaagcggcc gtcgaccgat gcccttgaga 240 
gccttcaacc 250 

<210> 77 
<211> 5 
<212> PRT 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 77 

Cys Leu Asn lie Gin 
1 5 



<210> 78 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 78 

Val Phe Lys His Ala 
1 5 



<210> 79 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 79 

Leu Phe Lys Gin Pro 

1 5 



<210> 80 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 
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<400> 80 

Cys Leu Asn Ser Asp 
1 5 



<210> 81 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; based on the 
ends of the Tn7 transposon 

<400> 81 

Cys Leu Asn lie Ser 
1 5 



<210> 82 
<211> 5 
<212> PRT 

<213> Artificial SQqu&ncB 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 82 

Cys Leu Asn Thr Asp 
1 5 



<210> 83 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 83 

Cys Leu Asn Asn Arg 
1 5 



<210> 84 
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<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 84 

Cys Leu Asn Ser Cys 
1 5 



<210> 85 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 85 

Cys Leu Asn Ser Asp 
1 5 



<210> 86 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 86 

Cys Leu Asn Thr Leu 
1 5 



<210> 87 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
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ends of the Tn7 transposon 
<400> 87 

Val Phe Lys Gin Pro 
1 5 



<210> 88 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 88 

Cys Leu Asn Ser Met 
1 5 



<210> 89 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence; based on the 
ends of the Tn7 transposon 

<400> 89 

Cys Leu Asn Asn Tyr 

1 5 



<210> 90 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 90 

Cys Leu Asn Met Ala 
1 5 
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<210> 91 
<211> 5 
<2i2> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 91 

Val Phe Lys His Lys 
1 5 



<210> 92 
<211> 5 
<212> PRT 

<213> Artificial Sec^aence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 92 

Cys Leu Asn Thr Lys 
1 5 



<210> 93 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 93 

Cys Leu Asn Lys Asp 

1 5 



<210> 94 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence; based on the 
ends of the Tn7 transposon 

<400> 94 

Met Phe Lys Gin He 
1 5 



<210> 95 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 95 

Cys Leu Asn He He 
1 5 



<210> 96 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 96 

Leu Phe Lys His Glu 
1 5 



<210> 97 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 97 

Val Phe Lys His Phe 
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1 5 



<210> 98 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tnl transposon 

<400> 98 

Cys Leu Asn Ser Val 

1 5 



<210> 99 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 99 

Val Phe Lys Gin lie 
1 5 



<210> 100 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 100 

Met Phe Lys Gin Ala 

1 5 



<210> 101 
<21i> 5 
<212> PRT 



31 



WO 00/71701 



PCTYUS00/14122 



<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 101 

Leu Phe Lys His His 
1 5 



<210> 102 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 102 

Leu Phe Lys His Gin 
1 5 



<210> 103 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 103 

Met Phe Lys His Val 
1 5 



<210> 104 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 
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<400> 104 

Val Phe Lys Gin Lys 
1 5 



<210> 105 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 105 

Leu Phe Lys Gin Gin 
1 5 



<210> 106 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 106 

Leu Phe Lys His Ser 
1 5 



<210> 107 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 107 

Cys Leu Asn Thr Gly 
1 5 



<210> 108 
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<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 108 

Cys Leu Asn Ser Arg 
1 5 



<210> 109 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 109 

Val Phe Lys His Leu 
1 5 



<210> 110 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 110 

Cys Leu Asn Asn lie 

1 5 



<210> 111 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
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ends of the Tn7 transposon 
<400> 111 

Leu Phe Lys His Gin 

1 5 



<210> 112 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 112 

Cys Leu Asn Lys His 
1 5 



<210> 113 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 113 

Met Phe Lys Gin Tyr 
1 5 



<210> 114 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 114 

Cys Leu Asn Lys Gin 
1 5 



35 



WO 00/71701 



PCT/US00/14122 



<210> 115 
<211> 5 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 115 

Cys Leu Asn Met Ser 
1 5 



<210> 116 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 116 

Leu Cys Leu Asn lie Leu Ala 
1 5 



<210> 117 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 117 

Asn Cys Leu Asn lie Asn Ala 
1 5 



<210> 118 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 118 

Leu Met Phe Lys His Leu Ser 
1 5 



<210> 119 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 119 

Thr Leu Phe Lys His Thr Arg 
1 5 



<210> 120 

<211> 7 r 

<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 120 

Lys Val Phe Lys Gin Lys Glu 
1 5 



<210> 121 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 121 

His Leu Val Phe Lys His Leu 
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<210> 122 
<2il> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 122 

Leu Cys Leu Asn Thr Leu Leu 
1 5 



<210> 123 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Vn7 transposon 

<400> 123 

Leu Cys Leu Asn Asn Leu Val 

1 5 



<210> 124 
<211> 7 
<212> PRT 

<213> Artificial Sequence 

<220> ' 
<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 124 

Glu Val Phe Lys His Glu Gly 

1 5 



<210> 125 
<211> 7 
<212> PRT 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 125 

Lys Val Phe Lys Gin Lys Gly 
1 5 



<210> 126 
<211> 7 
<212> PUT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 126 

Thr Cys Leu Asn Thr Thr lie 
1 5 



<210> 127 
<211> 7 
<212> PUT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 127 

Met Cys Leu Asn Asn Met Asn 
1 5 



<210> 128 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 
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<400> 128 

Leu Leu Phe Lys Gin Leu Arg 
1 5 



<210> 129 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 129 

Arg Cys Leu Asn Asn Arg Leu 
1 5 



<210> 130 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 130 

Met Val Phe Lys Gin Met Ala 
1 5 



<210> 131 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 131 

Ala Met Phe Lys Gin Ala Thr 
1 5 



<210> 132 
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<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : based on the 
ends of the Tn7 transposon 

<400> 132 

Leu Val Phe Lys His Leu Asp 
1 5 



<210> 133 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 133 

Lys Met Phe Lys Gin Lys Thr 
1 5 



<210> 134 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: based on the 
ends of the Tn7 transposon 

<400> 134 

Tyr Cys Leu Asn Asn Tyr Phe 

1 5 " 
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